The smacpod package implements methods for analyzing case-control point pattern data. It relies on the spatstat package for many of the internal computations, and many function arguments are simply the relevant arguments of a spatstat function.
We briefly summarize functionality below using the grave
data set. The grave
data set provides information related
to 143 medieval grave locations. 30 of the graves are
"affected"
by a tooth deformity, while the remaining graves
are "unaffected"
by the deformity. The grave
data set is a ppp
object, which is a class utilized by the
spatstat package.
We load and plot the data to start.
The spdensity
function computes the spatial density
function of a point pattern. The spatstat package
provides a density
function, which is actually the
estimated intensity of a point pattern. The spdensity
function scales the intensity to integrate to 1. The result is an
im
object, a class from the spatstat
package, which has an associated plot
method. We estimate
and plot the spatial density of the affected
grave
locations below. We do not specify the bandwidth or kernel function used
in the estimation process, so the default choices of the
spatstat package are used. The heat map of the
estimated spatial density indicates greater density of grave locations
in the lower right part of the study area.
Let \(D\) denote the study area
under consideration. Let \(\lambda_1(s)\) denote the intensity
function at location \(s\) of the group
designated as the case
group. Then the spatial density of
the case
group can be denoted as \[f(s)=\frac{\lambda_1(s)}{\int_D \lambda_1(s)
ds}.\] Similarly the intensity function and spatial denstiy
function of the control
group are denoted as \(\lambda_0(s)\) and \(g(s)\), respectively.
The log relative risk function is defined as \[r(s) =
\ln\left(\frac{f(s)}{g(s)}\right).\] The log relative risk
function can be estimated by \[\hat{r}(s) =
\ln\left(\frac{\hat{f}(s)}{\hat{g}(s)}\right),\] where \(\hat{f}\) and \(\hat{g}\) are the estimates of \(f\) and \(g\) from the spdensity
function.
The log relative risk can be estimated using the logrr
function. The function takes:
x
: a ppp
object with marks
for the cases and controlssigma
: the bandwidth parameter of the case
groupsigmacon
: the bandwidth parameter of the control
groupcase
: a character string specifying the name of the
case group or the index of the case group in
levels(x$marks)
. The default is case = 2
, the
second level of x$marks
.If sigma
and sigmacon
aren’t specified,
then the spatstat.explore::bw.relrisk
function is used for
these arguments.
We estimate the log relative risk below for the grave
data, selecting the affected
graves as the
case
group, and then plot the results. The
logrr
returns an object of class im
, which has
an associated plot
method. We also add contours of \(\hat{r}(s)\) using the contour
function. The ares with levels around 1-1.5 have the highest relative
risk of cases to controls. These are the areas most likely to have a
cluster of affected grave locations relative to unaffected grave
locations.
## affected has been selected as the case group
plot(rhat, main = "log relative risk of affected versus unaffected graves")
contour(rhat, add = TRUE)
If the ratio \(f(s)/g(s)\) is
constant, then the relative risk is 1 and \(r(s)=0\). To determine where the log
relative risk differs significantly from zero, we can simulate
nsim
new case-control data sets under the random labeling
hypothesis, estimate \(r(s)\) for each
simulated data set, and construct tolerance envelopes for \(r(s)\) at each location. If \(\hat{r}(s)\) is outside the tolerance
envelopes, then the estimated log relative risk is inconsistent with
\(r(s)=0\) under the random labeling
hypothesis. This is the same is believing the spatial densities of the
cases and controls differ from zero at that location. If the spatial
density of the cases is greater than the spatial density of the
controls, then we believe there is a cluster of cases relative to the
controls. If the spatial density of the controls is greater than the
spatial density of the cases, then we believe there is a cluster of
controls relative to the cases. The tolerance envelopes can be
constructed by specifying the following arguments of
logrr
:
nsim
: the number of randomly labeled data sets to
generate.level
: the level of the tolerance envelopes. The
default is 0.90
.alternative
: the type of envelopes to construct. The
default is two.sided
.When nsim>0
, the logrr
function returns
an object of class renv
, which has a plot
method that makes it easy to identify the parts of the study area where
the estimated log relative risk is outside the tolerance envelopes. Any
colored area indicates a location where the log relative risk is outside
the tolerance envelopes. Non-colored areas indicate the estimated log
relative risk is inside the tolerance envelopes.
Similarly, there is a print
method for the
renv
class summarizing information about the object
constructed.
For the grave
data, the yellow areas indicate parts of
the study area where the affected
grave locations are
clustered relative to the unaffected
grave locations beyond
what is expected under the random labeling hypothesis. The purple areas
indicate parts of the study area where the unaffected
grave
locations are clustered relative to the affected
grave
locations beyond what is expected under the random labeling
hypothesis.
# construct tolerance envelopes for logrr
renv = logrr(grave, case = "affected", nsim = 99, level = 0.90)
## affected has been selected as the case group
##
## pixelwise tolerance envelopes for log relative risk, r(s)
##
## r(s) = ln[f(s)/g(s)]
## f(s) = spatial density of cases at location s
## g(s) = spatial density of controls at location s
## case label: affected
## control label: unaffected
## number of simulations: 99
## simulation procedure: random labeling
## envelope level: 0.9