Here, we present the differences between a classic SDM and our trophic SDM, starting from the simplest case when these two approaches coincide to more realistic cases where they do not.
Let’s consider a very simple case with one environmental covariate x, and two species yA and yB. We assume species yA to be basal and species yB to feed on yA. This leads to the following directed acyclic graph that resumes the dependencies across variables, assuming bottom-up control.
library(igraph)
G=graph_from_edgelist(as.matrix(data.frame(from = c("x","x","A"),
to = c("A","B","B"))))
plot(G)
We now assume that data on species distributions are continuous
observable entities (e.g. basal area, biomass), that we can model with a
linear regression. We name yiA
and yiB the observations of
species yA and yB at site i, respectively, and we assume that the
environment has a quadratic effect on both species. Therefore, using
classical SDMs, model equations are:
yiA=βSDM0A+βSDM1Axi+βSDM2Ax2i+ϵiA, ϵiA∼N(0,σSDMA)yiB=βSDM0B+βSDM1Bxi+βSDM2Bx2i+ϵiB, ϵiB∼N(0,σSDMB)
Instead, our trophic model includes a dependence of yB on its prey yA:
yiA=βtSDM0A+βtSDM1Axi+βtSDM2Ax2i+ϵiA, ϵiA∼N(0,σtSDMA)yiB=βtSDM0B+βtSDM1Bxi+βtSDM2Bx2i+αyiA+ϵiB, ϵiB∼N(0,σtSDMB)
We hereafter focus on the expectation of the models (corresponding to
the predicted values), focusing in particular on species yB for which the two models are
different. For SDMs:
E[yiB| xi]=βSDM0B+βSDM1Bxi+βSDM2Bx2i For trophic SDM,
we need to condition on a value of yiA. We here choose to condition on
the expected value of yiA, which
corresponds to using predictions from yA to predict yB. In a linear model, with Gaussian
data and a linear interaction between prey and predator, we have that
EA[EB[yB| yA]]=EB[yB| E[yA]] (which is not necessarily true in other cases).
We thus have:
E[yiB| xi,E[yiA]]=βtSDM0B+βtSDM1Bxi+βtSDM2Bx2i+αE[yiA] We can rewrite the expected value of yB only as a function of the environment
x only. E[yiB| xi,E[yiA]]=βtSDM0B+βtSDM1Bxi+βtSDM2Bx2i+α(βtSDM0A+βtSDM1Axi+βtSDM2Ax2i)=βtSDM0B+αβtSDM0A+(βtSDM1B+αβtSDM1A)xi+(βtSDM2B+αβtSDM2A)x2i=βSDM0B+βSDM1Bxi+βSDM2Bx2i
In this very simple case of a linear regression model with a linear
interactions between the predator and the preys, the estimation of the
regression coefficients is such that βSDM.B=βtSDM.B+αβtSDM.A As a consequence the models estimate the
same realised niche, and predictions from trophic and classical SDMs
coincide.
However, our trophic model allows a separation of the direct effect of
the environment on yB (i.e., the
potential niche of yB) from its
indirect effect due to the effect of yA on yB. In other words, using the jargon of
structural equation models, βtSDM1B and βtSDM2B are the direct effects
of the environment on species yB
(i.e., its potential niche), the terms αβtSDM1A and αβtSDM2A are the indirect
effect of the environment on species yB. The sum of these two effects, βtSDM1B+αβtSDM1A and βtSDM2B+αβtSDM2A corresponds to the total effect of the
environment on yB, i.e., the
realised niche of yB. In this very
particular case, SDM can correctly estimate the total effect (the
realised niche), but cannot separate the direct and indirect effect of
the environment on yB and thus
they do not correctly estimate the potential niche.
We hereafter present a short example confirming the analytical derivation of these results.
First, we simulate species distribution data accordingly to the directed acyclic graph presented above. Therefore, the environment has a quadratic effect on species yA and species yB, with a linear effect of yA on yB.
set.seed(1712)
# Create the environmental gradient
X = seq(0,5,length.out=100)
# Simulate data for the basal species A
A = 3*X + 2*X^2 + rnorm(100, sd = 5)
# Simulate data for the predator species B
B = 2*X -3*X^2 + 3*A + rnorm(100, sd = 5)
oldpar = par()
par(mfrow=c(1,3))
plot(X,A, main = "Species A realised \n and potential niche")
plot(X,B, main = "Species B realised niche")
plot(X, 2*X -3*X^2, main = "Species B potential niche")
We now run both a classic SDM and a trophic SDM
# For species A, the two models coincide
m_A = lm(A ~ X + I(X^2))
# For species B, the two models are different:
## trophic model
m_trophic = lm(B ~ X + I(X^2) + A)
## classic SDM
m_SDM = lm(B ~ X + I(X^2))
We can now test the relationships between the coefficients of SDM and trophic SDM.
# Show the equality of coefficients
all.equal(coef(m_SDM), coef(m_trophic)[-4] + coef(m_trophic)[4]*coef(m_A))
#> [1] TRUE
# Prediction coincide
pred_SDM = m_SDM$fitted.values
pred_trophic = predict(m_trophic,
newdata = data.frame(X = X, A = m_A$fitted.values))
all.equal(pred_SDM, pred_trophic)
#> [1] TRUE
The coefficient of classic SDM for species yB are equal to the environmental coefficients of trophic SDM for species yB plus the indirect effect of the environment on species yB (given by the product of the biotic term and the environmental coefficients of species yA). As a consequence, the predictions of both models coincide when the predicted value of species yA is used to predict yB. However, unlike the traditional SDM, our trophic model allows to retrieve the correct potential niche of species yB
# Potential niche infered by SDM
plot(X, m_SDM$fitted.values, ylim = c(-65, 120), ylab = "B",
main = "Potential niche of yB")
points(X, cbind(1, X, X^2) %*% coef(m_trophic)[-4], col ="red")
points(X, 2*X -3*X^2, col ="blue")
legend(-0.1, 120, legend=c("Potential niche inferred by SDM",
"Potential niche inferred by trophic SDM",
"True fundamenta niche"), pch =1,
col=c("black","red", "blue"), cex = 0.5)
We showed that SDM and trophic SDM estimate the same realized niche, and thus lead to the same predictions, when all the following conditions hold:
However, as soon as at least one of these conditions does not hold, the estimations of the realized niche (and so model prediction) differ. We hereafter show that releasing just one of these constraints at a time is enough to obtain predictions from trophic SDM that are different, and better, than classical SDMs. Therefore, the example presented above is a very particular case that is useful to understand model similarities, but that rarely occurs in application.
We hereafter release the conditions on the linearity of the
relationship between yB and yA. Indeed, we now consider that species
yB has a quadratic response to
yA. The classic SDM does not
change, as we model species as a function of the environment only.
Instead, the trophic model for species yB becomes:
yiB=βtSDM0B+βtSDM1Bxi+βtSDM2Bx2i+αy2iA+ϵiB, ϵiB∼N(0,σtSDMB)
Predictions between trophic SDM and SDM are now different, since: E[yiB| xi,E[yiA]]=βtSDM0B+βtSDM1Bxi+βtSDM2Bx2i+α(βtSDM0A+βtSDM1Axi+βtSDM2Ax2i)2=βtSDM0B+α(βtSDM0A)2+(βtSDM1B+2αβtSDM0AβtSDM1A)xi +(βtSDM2B+α(2βtSDM0AβtSDM2A+(βtSDM1A)2))x2i+2αβtSDM1AβtSDM2Ax3i+α(βtSDM2A)2x4i≠βSDM0B+βSDM1Bxi+βSDM2Bx2i
In order to correctly infer the realised niche, classic SDMs should be specified with a polynomial of degree 4. As soon as we add multiple environmental variables and link between species, classic SDM should be specified with extremely complex formula, which is unfeasible for real application.
We provide a simple example to demonstrate the analytical computations with a simulation. We use the same example of above, but we now simulate a quadratic relationship of yB with respect to yA.
B = 2*X -3*X^2 + 3*A^2 + rnorm(100, sd = 5)
oldpar = par()
par(mfrow=c(1,3))
plot(X,A, main = "Species A realised \n and potential niche")
plot(X,B, main = "Species B realised niche")
plot(X, 2*X -3*X^2, main = "Species B potential niche")
We now compare the regression coefficients and predictions from the two models
# Show the equality of coefficients
all.equal(coef(m_SDM), coef(m_trophic)[-4] + coef(m_trophic)[4]*coef(m_A))
#> [1] "Mean relative difference: 0.185587"
# Prediction are now different
pred_SDM = m_SDM$fitted.values
pred_trophic = predict(m_trophic,
newdata = data.frame(X = X, A = m_A$fitted.values))
all.equal(pred_SDM, pred_trophic)
#> [1] "Mean relative difference: 53.61947"
#Compare R2
cor(B, pred_SDM)^2
#> [1] 0.8339402
cor(B, pred_trophic)^2
#> [1] 0.9605349
The relationship between the coefficients of the two models does not hold anymore more, and predictions are now different, with trophic SDM reaching a far better predictive accuracy than classic SDM for species yB (R2 of 0.83 and 0.96 respectively).
Let’s now consider here the case of presence-absences (i.e., binary
data), without modifying the other two conditions. Therefore, we use a
(generalized) linear model like logistic regression, and we model a
linear relationship (at the link level) between predator and prey. Thus,
for SDM we have:
p(yiA=1)=logit−1(βSDM0A+βSDM1Axi+βSDM2Ax2i)p(yiB=1)=logit−1(βSDM0B+βSDM1Bxi+βSDM2Bx2i)
Instead, our trophic model includes a dependence of yB on its prey yA:
p(yiB=1)=logit−1(βtSDM0B+βtSDM1Bxi+βtSDM2Bx2i+αyA) The equality between the models does not hold
anymore due to the non linearity of the logit function. Indeed: E[yiB| xi,E[yiA]]=p(yiB=1| xi,p(yiA=1))=logit−1(βtSDM0B+βtSDM1Bxi+βtSDM2Bx2i+αlogit−1(βtSDM0A+βtSDM1Axi+βtSDM2Ax2i))≠logit−1(βSDM0B+βSDM1Bxi+βSDM2Bx2i)
We now confirm this analytical derivation with a simple example. The code is very similar to the previous one, but we now generate binary data from a logistic model.
# Generate the link layer basal species
V_A= 3*X - 2*X^2
# Transform to probability
prob_A = 1/(1 + exp(-V_A))
# Sample the binary data
A = rbinom(n = 100, size = 1, prob = prob_A)
# Same for species B
V_B = -3*X +1*X^2 + 3*A
prob_B = 1/(1 + exp(-V_B))
B = rbinom(n = 100, size = 1, prob = prob_B)
oldpar = par()
par(mfrow=c(1,3))
plot(X,prob_A, main = "Species A realised \n and potential niche")
plot(X,prob_B, main = "Species B realised niche")
plot(X, 1/(1+exp(-(-3*X +1*X^2))), main = "Species B potential niche")
We then fit a classic SDM and a trophic SDM
# For species A, the two models coincide
m_A = glm(A ~ X + I(X^2), family = binomial(link = "logit"))
# For species B, the two models are different:
m_SDM = glm(B ~ X + I(X^2), family = binomial(link = "logit"))
m_trophic = glm(B ~ X + I(X^2) + A, family = binomial(link = "logit"))
We now test the relationships between model coefficients and whether the equality of model predictions still holds
# Equality of the coefficient does not hold anymore
all.equal(coef(m_SDM), coef(m_trophic)[-4] + coef(m_trophic)[4]*coef(m_A))
#> [1] "Mean relative difference: 2.305022"
# Predictions do not coincide anymore
pred_SDM = m_SDM$fitted.values
pred_trophic = predict(m_trophic,
newdata = data.frame(X = X, A = m_A$fitted.values))
all.equal(pred_SDM, pred_trophic)
#> [1] "Mean relative difference: 2.915665"
# Predictions are improved by trophic SDM
## AUC of species B for SDM
dismo::evaluate(p = pred_SDM[which(B==1)],
a = pred_SDM[which(B==0)] )@auc
#> [1] 0.8408333
## AUC of species B for trophic SDM
dismo::evaluate(p = pred_trophic[which(B==1)],
a = pred_trophic[which(B==0)] )@auc
#> [1] 0.8529167
The relationship between the coefficients of the two models does not hold anymore more, and predictions are now different, with trophic SDM that slightly improve classic SDM (AUC of species yB goes from 0.84 to 0.85).
We finally show how using a non-linear statistical model leads to different predictions between classical SDMs and a trophic SDM with the example of a random forest. Since we cannot provide any analytical derivation for random forest, we simply provide an example. Data are simulated exactly as in the very first examples (continuous observable entities).
# Simulate data for the basal species A
A = 3*X + 2*X^2 + rnorm(100, sd = 5)
# Simulate data for the predator species B
B = 2*X -3*X^2 + 3*A + rnorm(100, sd = 5)
oldpar = par()
par(mfrow=c(1,3))
plot(X,A, main = "Species A realised \n and potential niche")
plot(X,B, main = "Species B realised niche")
plot(X, 2*X -3*X^2, main = "Species B potential niche")
We infer statistical models with random forest
library(randomForest)
#> randomForest 4.7-1.1
#> Type rfNews() to see new features/changes/bug fixes.
# Species A
m_A = randomForest(A ~ X + I(X^2))
# Species B trophic SDM
m_trophic = randomForest(B ~ X + I(X^2) + A)
# Species B classic SDM
m_SDM <- randomForest(B ~ X + I(X^2))
We now compare the predictions from the two models.
# Prediction are now different
pred_SDM = m_SDM$predicted
pred_trophic = predict(m_trophic,
newdata = data.frame(X = X, A = m_A$predicted))
all.equal(pred_SDM, pred_trophic)
#> [1] "Mean relative difference: 0.08021014"
#Compare R2
cor(B, pred_SDM)^2
#> [1] 0.8075415
cor(B, pred_trophic)^2
#> [1] 0.8670482
Predictions from the model are different, and trophic SDM has a larger R2 than classical SDMs (0.86 versus 0.80).