Linear and multiple regression in GRASS GIS
To illustrate linear and multiple regression, we analyse the relationship between mean temperature, elevation and latitude.
Linear regression
The objective here is to examine a dataset in terms of linear relationship. Specifically, we analyse European annual long-term mean temperatures, elevation and location as expressed by latitude. For this purpose GRASS GIS provides r.regression.line.
Enter in a terminal window:
g.region raster=elev_v17 -p
# create linear regression model
r.regression.line mapx=elev_v17 mapy=tmean.1981_2010.annual.avg
# since r.regression.line does not apply the model, we need to assess the model output
# and then apply it with r.mapcalc.
# re-run with machine-parsable output (-g flag), printed into the terminal
r.regression.line mapx=elev_v17 mapy=tmean.1981_2010.annual.avg -g
# re-run model, now with machine-parsed output (-g flag and eval shell function)
## eval is a nice trick for transferring module output into the script workflow
eval `r.regression.line mapx=elev_v17 mapy=tmean.1981_2010.annual.avg -g`
echo "a (Offset): $a"
echo "b (Gain): $b"
# Apply the model using the model variables (like the predict() function in R)
r.mapcalc "tmean_model_linear = $a + $b * elev_v17"
# Calculate a difference map: linear model versus observed map
r.mapcalc "tmean_model_diff_linear = tmean_model_linear - tmean.1981_2010.annual.avg"
# Apply differences color palette
r.colors tmean_model_diff_linear color=differences
# visualize
## hint: CTRL-R to find previous commands by typing a part of it; ENTER to select
d.erase
d.rast tmean_model_diff_linear
d.vect map=country_boundaries type=boundary
d.legend -t -s -b -d raster=tmean_model_diff_linear title="Diffs Temp [deg C] linear regression" title_fontsize=20 font=sans fontsize=18
# univariate statistics
r.univar tmean_model_diff_linear
TASK: Does this simple linear model explain anything?
Multiple regression
Since a simple linear model may not sufficiently explain the relationship between European annual long-term mean temperatures, elevation and location as expressed by latitude we also perform a multiple regression analysis (y = b0 + b1*x1 + b2*x2 + ... + bn*xn + e
).
The r.regression.multi module is designed for large datasets that can not be easily processed in R
. Note that a p
value is therefore not provided, because even very small, meaningless effects will become significant with a large number of cells.
Enter in a terminal window:
# Create a latitude map, setting first the pixel geometry to an existing map (a.k.a. comp region)
g.region raster=elev_v17 -p
r.mapcalc "latitude = y()"
# create multiple linear regression model, create estimates and residuals maps
r.regression.multi mapx=elev_v17,latitude mapy=tmean.1981_2010.annual.avg estimates=tmean_model_multi residuals=tmean_model_multi_resid
# Calculate a difference map
r.mapcalc "tmean_model_diff_multi = tmean_model_multi - tmean.1981_2010.annual.avg"
# Apply differences color palette
r.colors tmean_model_diff_multi color=differences
d.erase
d.rast tmean_model_diff_multi
d.vect map=country_boundaries type=boundary
d.legend -t -s -b -d raster=tmean_model_diff_multi title="Diffs Temp [deg C] multi-regression" title_fontsize=20 font=sans fontsize=18
d.erase
d.rast tmean_model_multi_resid
d.vect map=country_boundaries type=boundary
d.legend -t -s -b -d raster=tmean_model_multi_resid title="Residuals of Temp [deg C] multi-regression" title_fontsize=20 font=sans fontsize=18
Fig: Multiple linear regression model
Visualize the model outputs and difference maps. Please check the manual of r.regression.line and r.regression.multi.
Using R within GRASS GIS
To use R
functionalities within GRASS GIS, refer to here.
Back to home.