| title | Spatially-explicit logistic regression |
|---|
The R Script associated with this page is available here. Download this file and open it (or copy-paste into a new script) with RStudio so you can follow along.
library(knitr)
library(raster)
library(rasterVis)
library(dplyr)
library(ggplot2)
# devtools::install_github("dkahle/ggmap")
library(ggmap)
library(rgdal)
library(rgeos)
library(tidyr)
library(sf)
library(leaflet)
library(DT)
library(widgetframe)
## New Packages
library(mgcv) # package for Generalized Additive Models
library(ncf) # has an easy function for correlograms
library(grid)
library(gridExtra)
library(xtable)
library(maptools)- To demonstrate a simple presence/absence modelling in spatial context.
- To model spatial dependence (autocorrelation) in the response.
Overview of R's spatial toolset is here.
Note: this is not meant to be an exhaustive introduction to species distribution modelling.
Today we will model space by smooth splines in mgcv package.
Examples of Alternative approaches:
- Simple polynomials
- Eigenvector Mapping:
vegan,spdep - Predictive process:
spbayes
Methods that tweak variance-covariance matrix of Multivariate Normal Distribution:
- Generalized Least Squares:
MASS,nlme - Autoregressive Models:
spdep - GeoBUGS module in OpenBUGS
See Dormann et al. 2007 Ecography, 30: 609-628 for a review.
We'll attempt to explain the spatial distribution of the Purple finch (Carpodacus purpureus) in San Diego county, California:
Load a vector dataset (shapefile) representing the San Diego bird atlas data for the Purple finch:
finch <- read_sf(system.file("extdata", "finch",
package = "DataScienceData"),
layer="finch")
st_crs(finch)="+proj=utm +zone=11 +ellps=GRS80 +datum=NAD83 +units=m +no_defs "Plot the finch dataset in leaflet.
st_transform(finch,"+proj=longlat +datum=WGS84")%>%
leaflet() %>% addTiles() %>%
addPolygons()%>%
frameWidget(height=400)But we can do better than that. Let's add a couple layers and an overview map.
st_transform(finch,"+proj=longlat +datum=WGS84")%>%
leaflet() %>% addTiles() %>%
addPolygons(label=paste(finch$BLOCKNAME," (NDVI=",finch$ndvi,")"),
group = "NDVI",
color = "#444444",
weight = 0.1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
fillColor = ~colorQuantile("YlOrRd", ndvi)(ndvi),
highlightOptions = highlightOptions(color = "white", weight = 2,
bringToFront = TRUE)) %>%
addPolygons(label=paste(finch$BLOCKNAME," (NDVI=",finch$ndvi,")"),
group = "Presence/Absence",
color = "#444444",
weight = 0.1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
fillColor = ifelse(finch$present,"red","transparent"),
highlightOptions = highlightOptions(color = "white", weight = 2,
bringToFront = TRUE)) %>%
addLayersControl(
baseGroups = c("NDVI", "Presence/Absence"),
options = layersControlOptions(collapsed = FALSE)
)%>%
addMiniMap()%>%
frameWidget(height = 600)Explore the other variables in the finch dataset with summary(finch). Build on the map above to add the mean elevation (meanelev) in each polygon as an additional layer.
Show Solution
st_transform(finch,"+proj=longlat +datum=WGS84")%>%
leaflet() %>% addTiles() %>%
addPolygons(label=paste(finch$BLOCKNAME," (NDVI=",finch$ndvi,")"),
group = "NDVI",
color = "#444444",
weight = 0.1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
fillColor = ~colorQuantile("YlOrRd", ndvi)(ndvi),
highlightOptions = highlightOptions(color = "white", weight = 2,
bringToFront = TRUE)) %>%
addPolygons(label=paste(finch$BLOCKNAME," (Elevation=",finch$meanelev,")"),
group = "Elevation",
color = "#444444",
weight = 0.1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
fillColor = ~colorQuantile("YlOrRd", meanelev)(meanelev),
highlightOptions = highlightOptions(color = "white", weight = 2,
bringToFront = TRUE)) %>%
addPolygons(label=paste(finch$BLOCKNAME," (NDVI=",finch$ndvi,")"),
group = "Presence/Absence",
color = "#444444",
weight = 0.1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
fillColor = ifelse(finch$present,"red","transparent"),
highlightOptions = highlightOptions(color = "white", weight = 2,
bringToFront = TRUE)) %>%
addLayersControl(
baseGroups = c("NDVI", "Elevation","Presence/Absence"),
options = layersControlOptions(collapsed = FALSE)
)%>%
addMiniMap()%>%
frameWidget(height=600)You could also visualize these data with multiple ggplot panels:
p1=ggplot(finch) +
scale_fill_gradient2(low="blue",mid="grey",high="red")+
coord_equal()+
ylab("")+xlab("")+
theme(legend.position = "right")+
theme(axis.ticks = element_blank(), axis.text = element_blank())
p1a=p1+geom_sf(aes(fill = ndvi))## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
p1b=p1+geom_sf(aes(fill = meanelev))## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
p1c=p1+geom_sf(aes(fill = urban))## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
p1d=p1+geom_sf(aes(fill = maxtmp))## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
grid.arrange(p1a,p1b,p1c,p1d,ncol=1) Now look at the associated data frame (analogous to the *.dbf file that accompanies a shapefile):
datatable(finch, options = list(pageLength = 5))%>%
frameWidget(height=400)Note: in your final projects, don't simply print out large tables or outputs... Filter/select only data relevent to tell your 'story'...
Statistical models generally perform better when covariates have a mean of zero and variance of 1. We can quickly calculate this using the scale() function:
First let's select only the columns we will use for modeling.
finch=mutate(finch,ndvi_scaled=as.numeric(scale(ndvi)))Compare three models:
- Only NDVI
- Only Space
- Space and NDVI
Now we will do the actual modelling. The first simple model links the probability of a presences or absences to NDVI.
Note: this assumes residuals are iid (independent and identically distributed).
It can be fitted by simple glm() in R:
ndvi.only <- glm(present~ndvi_scaled,
data=finch, family="binomial")Extract predictions and residuals:
finch$m_pred_ndvi <- predict(ndvi.only, type="response")
finch$m_resid_ndvi <- residuals(ndvi.only)Plot the estimated logistic curve:
ggplot(finch,aes(x=ndvi/256,y=m_pred_ndvi))+
geom_line(col="red")+
geom_point(mapping=aes(y=present))+
xlab("NDVI")+
ylab("P(presence)")Print a summary table:
xtable(ndvi.only,
caption="Model summary for 'NDVI-only'")%>%
print(type="html")| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -2.9388 | 0.2960 | -9.93 | 0.0000 |
| ndvi_scaled | 2.6521 | 0.3223 | 8.23 | 0.0000 |
The second model fits only the spatial trend in the data (using GAM and splines):
space.only <- gam(present~s(X_CEN, Y_CEN),
data=finch, family="binomial")Extract the predictions and residuals
finch$m_pred_space <- as.numeric(predict(space.only, type="response"))
finch$m_resid_space <- residuals(space.only)Plot the spatial term of the model:
finch$m_space=as.numeric(predict(space.only,type="terms"))
st_transform(finch,"+proj=longlat +datum=WGS84")%>%
leaflet() %>% addTiles() %>%
addPolygons(color = "#444444",
weight = 0.1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
fillColor = ~colorQuantile("YlOrRd", m_space)(m_space),
highlightOptions = highlightOptions(color = "white", weight = 2,
bringToFront = TRUE))%>%
frameWidget(height=200)Print a summary table
xtable(summary(space.only)$s.table,
caption="Model summary for 'Space-only'")%>%
print(type="html")| edf | Ref.df | Chi.sq | p-value | |
|---|---|---|---|---|
| s(X_CEN,Y_CEN) | 28.83 | 28.98 | 51.06 | 0.01 |
The third model uses both the NDVI and spatial trends to explain the finch's occurrences:
space.and.ndvi <- gam(present~ndvi + s(X_CEN, Y_CEN),
data=finch, family="binomial")
## extracting predictions and residuals:
finch$m_pred_spacendvi <- as.numeric(predict(space.and.ndvi, type="response"))
finch$m_resid_spacendvi <- residuals(space.and.ndvi)Print a summary table
xtable(summary(space.and.ndvi)$s.table,
caption="Model summary for 'Space and NDVI'")%>%
print(type="html")| edf | Ref.df | Chi.sq | p-value | |
|---|---|---|---|---|
| s(X_CEN,Y_CEN) | 23.35 | 25.84 | 47.74 | 0.01 |
Plot the spatial term of the model:
finch$m_ndvispace=as.numeric(predict(space.and.ndvi,type="terms")[,2])
st_transform(finch,"+proj=longlat +datum=WGS84")%>%
ggplot(aes(x=X_CEN,y=Y_CEN)) +
geom_sf(aes(fill = m_ndvispace))+
geom_point(aes(col=as.logical(present)))+
scale_fill_gradient2(low="blue",mid="grey",high="red",name="Spatial Effects")+
scale_color_manual(values=c("transparent","black"),name="Present")Now let's put all of the predictions together into a single long table:
p1=st_transform(finch,"+proj=longlat +datum=WGS84")%>%
ggplot()+
scale_fill_gradient2(low="blue",mid="grey",high="red")+
scale_color_manual(values=c("transparent","black"),name="Present",guide="none")+
coord_equal()+
ylab("")+xlab("")+
theme(legend.position = "right")+
theme(axis.ticks = element_blank(), axis.text = element_blank())
pts=geom_point(data=finch,aes(x=X_CEN,y=Y_CEN,col=as.logical(present)),size=.5)
p1a=p1+geom_sf(aes(fill = m_pred_spacendvi))+pts## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
p1b=p1+geom_sf(aes(fill = m_pred_space))+pts## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
p1c=p1+geom_sf(aes(fill = m_pred_ndvi))+pts## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
grid.arrange(p1a,p1b,p1c,ncol=1) We can compare model performance of the models with Akaike's Information Criterion (AIC). This uses the formula $AIC=-2log-likelihood + knpar$, where
-
$npar$ number of parameters in the fitted model -
$k = 2$ penalty per parameter
Lower is better...
datatable(AIC(ndvi.only,
space.only,
space.and.ndvi))| <\/th>\n | df<\/th>\n | AIC<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":[1,2]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false},"selection":{"mode":"multiple","selected":null,"target":"row"}},"evals":[],"jsHooks":[]}</script>
Spatial Autocorrelation of ResidualsShould always check the spatial correlation in model residuals to evaluate assumptions. We will use the function inc=10000 #spatial increment of correlogram in m
# add coordinates of each polygon's centroid to the sf dataset
finch[,c("x","y")]=st_centroid(finch)%>%st_coordinates()#use by() in dplyr package to compute a correlogram for each parameter
cor=finch%>%
dplyr::select(y,x,contains("resid"),present)%>%
gather(key = "key", value = "value",contains("resid"),present,-y,-x)%>%
group_by(key)%>%
do(var=.$key,cor=correlog(.$x,.$y,.$value,increment=inc, resamp=100,quiet=T))%>%
do(data.frame(
key=.$key[[1]],
Distance = .$cor$mean.of.class/1000,
Correlation=.$cor$correlation,
pvalue=.$cor$p, stringsAsFactors=F))And we can plot the correlograms: ggplot(cor,aes(x=Distance,y=Correlation,col=key,group=key))+
geom_point(aes(shape=pvalue<=0.05))+
geom_line()+
xlab("Distance (km)")+ylab("Spatial\nAuto-correlation")What did we gain by making the model "spatially explicit"?
## Your turn
Try adding additional co-variates into the spatial model (e.g. elevation or climate). Show Solution m1 <- gam(present~ndvi+meanelev+
wintert+meanppt+urban +
s(X_CEN, Y_CEN),
data=finch, family="binomial")
m2 <- gam(present~ndvi+meanppt +
s(X_CEN, Y_CEN),
data=finch, family="binomial")Print a summary table xtable(summary(m1)$p.table)%>%
print(type="html")
Compare all models datatable(AIC(ndvi.only,
space.only,
space.and.ndvi,
m1,m2))
|
|---|





