Demonstration of the smacpod package

Introduction

The smacpod package implements methods for analyzing case-control point pattern data. It relies on the spatstat package for many of the internal computations, and many function arguments are simply the relevant arguments of a spatstat function.

We briefly summarize functionality below using the grave data set. The grave data set provides information related to 143 medieval grave locations. 30 of the graves are "affected" by a tooth deformity, while the remaining graves are "unaffected" by the deformity. The grave data set is a ppp object, which is a class utilized by the spatstat package.

We load and plot the data to start.

# load packages
library(smacpod)
data(grave)
# pch changes point symbol
plot(grave, pch = c(1, 19), main = "grave locations", xlab = "x", ylab = "y")

Spatial density

The spdensity function computes the spatial density function of a point pattern. The spatstat package provides a density function, which is actually the estimated intensity of a point pattern. The spdensity function scales the intensity to integrate to 1. The result is an im object, a class from the spatstat package, which has an associated plot method. We estimate and plot the spatial density of the affected grave locations below. We do not specify the bandwidth or kernel function used in the estimation process, so the default choices of the spatstat package are used. The heat map of the estimated spatial density indicates greater density of grave locations in the lower right part of the study area.

# identify affected observations
aff <- (grave$marks == "affected")
# estimated spatial density function for affected observations
f <- spdensity(grave[aff,])
# plot spatial density function of affected
plot(f, "spatial density of affected graves")

Log relative risk

Let \(D\) denote the study area under consideration. Let \(\lambda_1(s)\) denote the intensity function at location \(s\) of the group designated as the case group. Then the spatial density of the case group can be denoted as \[f(s)=\frac{\lambda_1(s)}{\int_D \lambda_1(s) ds}.\] Similarly the intensity function and spatial denstiy function of the control group are denoted as \(\lambda_0(s)\) and \(g(s)\), respectively.

The log relative risk function is defined as \[r(s) = \ln\left(\frac{f(s)}{g(s)}\right).\] The log relative risk function can be estimated by \[\hat{r}(s) = \ln\left(\frac{\hat{f}(s)}{\hat{g}(s)}\right),\] where \(\hat{f}\) and \(\hat{g}\) are the estimates of \(f\) and \(g\) from the spdensity function.

The log relative risk can be estimated using the logrr function. The function takes:

If sigma and sigmacon aren’t specified, then the spatstat.explore::bw.relrisk function is used for these arguments.

We estimate the log relative risk below for the grave data, selecting the affected graves as the case group, and then plot the results. The logrr returns an object of class im, which has an associated plot method. We also add contours of \(\hat{r}(s)\) using the contour function. The ares with levels around 1-1.5 have the highest relative risk of cases to controls. These are the areas most likely to have a cluster of affected grave locations relative to unaffected grave locations.

rhat <- logrr(grave, case = "affected")
## affected has been selected as the case group
plot(rhat, main = "log relative risk of affected versus unaffected graves")
contour(rhat, add = TRUE)

If the ratio \(f(s)/g(s)\) is constant, then the relative risk is 1 and \(r(s)=0\). To determine where the log relative risk differs significantly from zero, we can simulate nsim new case-control data sets under the random labeling hypothesis, estimate \(r(s)\) for each simulated data set, and construct tolerance envelopes for \(r(s)\) at each location. If \(\hat{r}(s)\) is outside the tolerance envelopes, then the estimated log relative risk is inconsistent with \(r(s)=0\) under the random labeling hypothesis. This is the same is believing the spatial densities of the cases and controls differ from zero at that location. If the spatial density of the cases is greater than the spatial density of the controls, then we believe there is a cluster of cases relative to the controls. If the spatial density of the controls is greater than the spatial density of the cases, then we believe there is a cluster of controls relative to the cases. The tolerance envelopes can be constructed by specifying the following arguments of logrr:

When nsim>0, the logrr function returns an object of class renv, which has a plot method that makes it easy to identify the parts of the study area where the estimated log relative risk is outside the tolerance envelopes. Any colored area indicates a location where the log relative risk is outside the tolerance envelopes. Non-colored areas indicate the estimated log relative risk is inside the tolerance envelopes.

Similarly, there is a print method for the renv class summarizing information about the object constructed.

For the grave data, the yellow areas indicate parts of the study area where the affected grave locations are clustered relative to the unaffected grave locations beyond what is expected under the random labeling hypothesis. The purple areas indicate parts of the study area where the unaffected grave locations are clustered relative to the affected grave locations beyond what is expected under the random labeling hypothesis.

# construct tolerance envelopes for logrr
renv = logrr(grave, case = "affected", nsim = 99, level = 0.90)
## affected has been selected as the case group
# print info for tolerance envelopes
renv
## 
## pixelwise tolerance envelopes for log relative risk, r(s)
## 
## r(s) = ln[f(s)/g(s)]
## f(s) = spatial density of cases at location s
## g(s) = spatial density of controls at location s
## case label:  affected 
## control label:  unaffected 
## number of simulations: 99 
## simulation procedure: random labeling
## envelope level:  0.9
# plot areas with rhat outside of envelopes
plot(renv)