---
title: "Spatially-explicit logistic regression"
---
```{r echo=FALSE, message=FALSE, results='hide', purl=FALSE, cache=F}
source("knitr_header.R")
```
[ The R Script associated with this page is available here](`r output`). Download this file and open it (or copy-paste into a new script) with RStudio so you can follow along.
```{r cache=F,message=F,warning=FALSE}
library(knitr)
library(raster)
library(rasterVis)
library(dplyr)
library(ggplot2)
# devtools::install_github("dkahle/ggmap")
library(ggmap)
library(rgdal)
library(rgeos)
library(tidyr)
library(sf)
library(leaflet)
library(DT)
library(widgetframe)
## New Packages
library(mgcv) # package for Generalized Additive Models
library(ncf) # has an easy function for correlograms
library(grid)
library(gridExtra)
library(xtable)
library(maptools)
```
## Goal of this class
* To demonstrate a simple presence/absence modelling in spatial context.
* To model spatial dependence (autocorrelation) in the response.
Overview of [R's spatial toolset is here](http://cran.r-project.org/web/views/Spatial.html).
Note: this is _not_ meant to be an exhaustive introduction to species distribution modelling.
## Modeling spatial autocorrelation
Today we will model space by **smooth splines** in `mgcv` package.
Examples of Alternative approaches:
- Simple polynomials
- Eigenvector Mapping: ```vegan```, ```spdep```
- Predictive process: ```spbayes```
Methods that tweak variance-covariance matrix of **Multivariate Normal Distribution**:
- Generalized Least Squares: ```MASS```, ```nlme```
- Autoregressive Models: ```spdep```
- GeoBUGS module in OpenBUGS
See [Dormann et al. 2007 Ecography, 30: 609-628](http://onlinelibrary.wiley.com/doi/10.1111/j.2007.0906-7590.05171.x/full) for a review.
## Species Distribution Modeling
We'll attempt to explain the spatial distribution of the Purple finch (_Carpodacus purpureus_) in San Diego county, California:

(photo/Wikimedia)
## Preparing the data
Load a vector dataset (shapefile) representing the [San Diego bird atlas data](http://sdplantatlas.org/BirdAtlas/BirdPages.htm) for the Purple finch:
```{r, message=FALSE, warning=FALSE}
finch <- read_sf(system.file("extdata", "finch",
package = "DataScienceData"),
layer="finch")
st_crs(finch)="+proj=utm +zone=11 +ellps=GRS80 +datum=NAD83 +units=m +no_defs "
```
### Plot the shapefile
Plot the finch dataset in leaflet.
```{r message=F}
st_transform(finch,"+proj=longlat +datum=WGS84")%>%
leaflet() %>% addTiles() %>%
addPolygons()%>%
frameWidget(height=400)
```
But we can do better than that. Let's add a couple layers and an overview map.
```{r message=F}
st_transform(finch,"+proj=longlat +datum=WGS84")%>%
leaflet() %>% addTiles() %>%
addPolygons(label=paste(finch$BLOCKNAME," (NDVI=",finch$ndvi,")"),
group = "NDVI",
color = "#444444",
weight = 0.1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
fillColor = ~colorQuantile("YlOrRd", ndvi)(ndvi),
highlightOptions = highlightOptions(color = "white", weight = 2,
bringToFront = TRUE)) %>%
addPolygons(label=paste(finch$BLOCKNAME," (NDVI=",finch$ndvi,")"),
group = "Presence/Absence",
color = "#444444",
weight = 0.1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
fillColor = ifelse(finch$present,"red","transparent"),
highlightOptions = highlightOptions(color = "white", weight = 2,
bringToFront = TRUE)) %>%
addLayersControl(
baseGroups = c("NDVI", "Presence/Absence"),
options = layersControlOptions(collapsed = FALSE)
)%>%
addMiniMap()%>%
frameWidget(height = 600)
```
## Your turn
Explore the other variables in the `finch` dataset with `summary(finch)`. Build on the map above to add the mean elevation (`meanelev`) in each polygon as an additional layer.
You could also visualize these data with multiple ggplot panels:
```{r,fig.height=20}
p1=ggplot(finch) +
scale_fill_gradient2(low="blue",mid="grey",high="red")+
coord_equal()+
ylab("")+xlab("")+
theme(legend.position = "right")+
theme(axis.ticks = element_blank(), axis.text = element_blank())
p1a=p1+geom_sf(aes(fill = ndvi))
p1b=p1+geom_sf(aes(fill = meanelev))
p1c=p1+geom_sf(aes(fill = urban))
p1d=p1+geom_sf(aes(fill = maxtmp))
grid.arrange(p1a,p1b,p1c,p1d,ncol=1)
```
## Explore the data
Now look at the associated data frame (analogous to the *.dbf file that accompanies a shapefile):
```{r}
datatable(finch, options = list(pageLength = 5))%>%
frameWidget(height=400)
```
> Note: in your final projects, don't simply print out large tables or outputs... Filter/select only data relevent to tell your 'story'...
## Scaling and centering the environmental variables
Statistical models generally perform better when covariates have a mean of zero and variance of 1. We can quickly calculate this using the `scale()` function:
First let's select only the columns we will use for modeling.
```{r}
finch=mutate(finch,ndvi_scaled=as.numeric(scale(ndvi)))
```
## Fitting the models
Compare three models:
1. Only NDVI
2. Only Space
3. Space and NDVI
### Model 1 - only NDVI
Now we will do the actual modelling. The first simple model links the probability of a presences or absences to NDVI.
$$ \log(p_i/1-p_i)=\beta_0+\beta_1 NDVI_i $$
$$ o_i \sim Bernoulli(p_i) $$
> Note: this assumes residuals are _iid_ (independent and identically distributed).
It can be fitted by simple glm() in R:
```{r}
ndvi.only <- glm(present~ndvi_scaled,
data=finch, family="binomial")
```
Extract predictions and residuals:
```{r}
finch$m_pred_ndvi <- predict(ndvi.only, type="response")
finch$m_resid_ndvi <- residuals(ndvi.only)
```
Plot the estimated logistic curve:
```{r, fig.height=5}
ggplot(finch,aes(x=ndvi/256,y=m_pred_ndvi))+
geom_line(col="red")+
geom_point(mapping=aes(y=present))+
xlab("NDVI")+
ylab("P(presence)")
```
Print a summary table:
```{r, results='asis'}
xtable(ndvi.only,
caption="Model summary for 'NDVI-only'")%>%
print(type="html")
```
### Model 2 - only space
The second model fits only the spatial trend in the data (using GAM and splines):
```{r}
space.only <- gam(present~s(X_CEN, Y_CEN),
data=finch, family="binomial")
```
Extract the predictions and residuals
```{r}
finch$m_pred_space <- as.numeric(predict(space.only, type="response"))
finch$m_resid_space <- residuals(space.only)
```
Plot the ***spatial term*** of the model:
```{r, fig.height=7, fig.width=14}
finch$m_space=as.numeric(predict(space.only,type="terms"))
st_transform(finch,"+proj=longlat +datum=WGS84")%>%
leaflet() %>% addTiles() %>%
addPolygons(color = "#444444",
weight = 0.1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
fillColor = ~colorQuantile("YlOrRd", m_space)(m_space),
highlightOptions = highlightOptions(color = "white", weight = 2,
bringToFront = TRUE))%>%
frameWidget(height=200)
```
Print a summary table
```{r, results='asis'}
xtable(summary(space.only)$s.table,
caption="Model summary for 'Space-only'")%>%
print(type="html")
```
### Model 3 - space and NDVI
The third model uses both the NDVI and spatial trends to explain the finch's occurrences:
```{r}
space.and.ndvi <- gam(present~ndvi + s(X_CEN, Y_CEN),
data=finch, family="binomial")
## extracting predictions and residuals:
finch$m_pred_spacendvi <- as.numeric(predict(space.and.ndvi, type="response"))
finch$m_resid_spacendvi <- residuals(space.and.ndvi)
```
Print a summary table
```{r, results='asis'}
xtable(summary(space.and.ndvi)$s.table,
caption="Model summary for 'Space and NDVI'")%>%
print(type="html")
```
Plot the ***spatial term*** of the model:
```{r, fig.height=7, fig.width=14}
finch$m_ndvispace=as.numeric(predict(space.and.ndvi,type="terms")[,2])
st_transform(finch,"+proj=longlat +datum=WGS84")%>%
ggplot(aes(x=X_CEN,y=Y_CEN)) +
geom_sf(aes(fill = m_ndvispace))+
geom_point(aes(col=as.logical(present)))+
scale_fill_gradient2(low="blue",mid="grey",high="red",name="Spatial Effects")+
scale_color_manual(values=c("transparent","black"),name="Present")
```
## Examine the fitted models
Now let's put all of the predictions together into a single _long_ table:
```{r,fig.height=10}
p1=st_transform(finch,"+proj=longlat +datum=WGS84")%>%
ggplot()+
scale_fill_gradient2(low="blue",mid="grey",high="red")+
scale_color_manual(values=c("transparent","black"),name="Present",guide="none")+
coord_equal()+
ylab("")+xlab("")+
theme(legend.position = "right")+
theme(axis.ticks = element_blank(), axis.text = element_blank())
pts=geom_point(data=finch,aes(x=X_CEN,y=Y_CEN,col=as.logical(present)),size=.5)
p1a=p1+geom_sf(aes(fill = m_pred_spacendvi))+pts
p1b=p1+geom_sf(aes(fill = m_pred_space))+pts
p1c=p1+geom_sf(aes(fill = m_pred_ndvi))+pts
grid.arrange(p1a,p1b,p1c,ncol=1)
```
## Model comparison
We can compare model performance of the models with Akaike's Information Criterion (AIC). This uses the formula $AIC=-2*log-likelihood + k*npar$, where
* $npar$ number of parameters in the fitted model
* $k = 2$ penalty per parameter
Lower is better...
```{r}
datatable(AIC(ndvi.only,
space.only,
space.and.ndvi))
```
## Spatial Autocorrelation of Residuals
Should always check the spatial correlation in model residuals to evaluate assumptions. We will use the function ```correlog``` from the ```ncf``` package.
```{r, message=F,results='hide'}
inc=10000 #spatial increment of correlogram in m
# add coordinates of each polygon's centroid to the sf dataset
finch[,c("x","y")]=st_centroid(finch)%>%st_coordinates()
#use by() in dplyr package to compute a correlogram for each parameter
cor=finch%>%
dplyr::select(y,x,contains("resid"),present)%>%
gather(key = "key", value = "value",contains("resid"),present,-y,-x)%>%
group_by(key)%>%
do(var=.$key,cor=correlog(.$x,.$y,.$value,increment=inc, resamp=100,quiet=T))%>%
do(data.frame(
key=.$key[[1]],
Distance = .$cor$mean.of.class/1000,
Correlation=.$cor$correlation,
pvalue=.$cor$p, stringsAsFactors=F))
```
And we can plot the correlograms:
```{r}
ggplot(cor,aes(x=Distance,y=Correlation,col=key,group=key))+
geom_point(aes(shape=pvalue<=0.05))+
geom_line()+
xlab("Distance (km)")+ylab("Spatial\nAuto-correlation")
```
## What did we gain by making the model "spatially explicit"?
- We know that the effect of NDVI is not artificially amplified by pseudoreplication.
- We have more realistic predictions.
- We have a fitted surface that can be interpreted -- perhaps to guide us towards some additional spatially-structured predictors that can be important.
## Your turn
Try adding additional co-variates into the spatial model (e.g. elevation or climate).