% \VignetteIndexEntry{Users guide to geepack} % \VignetteKeyword{Generalized Estimating Equation} % \VignetteKeyword{Working correlation matrix} \documentclass{article} \usepackage{boxedminipage,color,a4,shortvrb,hyperref} %\usepackage[latin1]{inputenc} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \MakeShortVerb| \def\pkg#1{{\bf #1}} <>= require( geepack ) prettyVersion <- packageDescription("geepack")$Version prettyDate <- format(Sys.Date()) @ %% ,prefix.string=figures/geepack \SweaveOpts{keep.source=T} \title{On the usage of the \texttt{geepack} } \author{S\o ren H\o jsgaard and Ulrich Halekoh} \date{\pkg{geepack} version \Sexpr{prettyVersion} as of \Sexpr{prettyDate}} \begin{document} \SweaveOpts{concordance=TRUE} \parindent0pt\parskip4pt %% Efter preamble % \definecolor{myGray}{rgb}{0.95,0.95,0.95} % \makeatletter % \renewenvironment{Schunk}{ % \begin{lrbox}{\@tempboxa} % \begin{boxedminipage} % {\columnwidth}\scriptsize} % {\end{boxedminipage} % \end{lrbox}% % \colorbox{myGray}{\usebox{\@tempboxa}} % } % \makeatother \maketitle \tableofcontents \section{Introduction} \label{sec:introduction} \label{sec:intro} This note contains a few extra examples. We illustrate the usage of a the |waves| argument and the |zcor| argument together with a fixed working correlation matrix for the |geeglm()| function. \subsection{Citing \texttt{geepack}} The primary reference for the |geepack| package is \begin{quote} Halekoh, U., H\o jsgaard, S., Yan, J. (2006) {\em The R Package geepack for Generalized Estimating Equations (2006)} Journal of Statistical Software \url{https://www.jstatsoft.org/article/view/v015i02} \end{quote} @ <<>>= library(geepack) citation("geepack") @ %def If you use |geepack| in your own work, please do cite the above reference. \subsection{When do GEE's work best?} \label{sec:when-do-gees} \begin{enumerate} \item GEEs work best when you have relatively many relatively small clusters of about equal size in your data. \item If all your clusters are of size one you should not use GEEs; if all clusters are of size one a GEE corresponds to a generalized linear model. \item If you only have very few clusters (and in the extreme case only one cluster) you are likely to encounter numerical difficulties. \end{enumerate} NOTICE: Care must be taken with respect to the order in which the clusters appear in the dataset. See Section \ref{sec:waves} for details. \section{Simulating a dataset} \label{sec:simulating} To illustrate the usage of the |waves| argument and the |zcor| argument together with a fixed working correlation matrix for the |geeglm()| we simulate data suitable for a regression model. @ <<>>= library(geepack) n_cluster <- 6 n_time <- 5 set.seed(1213) timeorder <- rep(1:n_time, n_cluster) tvar <- timeorder + rnorm(length(timeorder)) idvar <- rep(1:n_cluster, each=n_time) uuu <- rep(rnorm(n_cluster), each=n_time) # A 'random intercept' yvar <- 1 + 2 * tvar + uuu + rnorm(length(tvar)) simdat <- data.frame(idvar, timeorder, tvar, yvar) head(simdat, 12) @ %def Notice that clusters of data appear together in |simdat| and that observations are ordered (according to |timeorder|) within clusters. We can fit a model with an AR(1) error structure as @ <<>>= mod1 <- geeglm(yvar~tvar, id=idvar, data=simdat, corstr="ar1") mod1 @ %def This works because observations are ordered according to time within each subject in the dataset. \section{Using the \texttt{waves} argument} \label{sec:waves} If observatios were not ordered according to cluster and time within cluster we would get the wrong result: @ <<>>= set.seed(123) simdatPerm <- simdat[sample(nrow(simdat)),] simdatPerm <- simdatPerm[order(simdatPerm$idvar),] head(simdatPerm) @ %def Notice that in |simdatPerm| data is ordered according to subject but the time ordering within subject is random. Fitting the model as before gives @ <<>>= mod2 <- geeglm(yvar~tvar, id=idvar, data=simdatPerm, corstr="ar1") mod2 @ %def Likewise if clusters do not appear contigously in data we also get the wrong result (the clusters are not recognized): @ <<>>= simdatPerm2 <- simdat[order(simdat$timeorder),] head(simdatPerm2) geeglm(yvar~tvar, id=idvar, data=simdatPerm2, corstr="ar1") @ %def To obtain the right result we must give the |waves| argument: @ <<>>= wav <- simdatPerm$timeorder wav mod3 <- geeglm(yvar~tvar, id=idvar, data=simdatPerm, corstr="ar1", waves=wav) mod3 @ %def \section{Using a fixed correlation matrix and the \texttt{zcor} argument} \label{sec:zcor} Suppose we want to use a fixed working correlation matrix: @ <<>>= cor.fixed <- matrix(c(1 , 0.5 , 0.25, 0.125, 0.125, 0.5 , 1 , 0.25, 0.125, 0.125, 0.25 , 0.25 , 1 , 0.5 , 0.125, 0.125, 0.125, 0.5 , 1 , 0.125, 0.125, 0.125, 0.125, 0.125, 1 ), 5, 5) cor.fixed @ %def Such a working correlation matrix has to be passed to |geeglm()| as a vector in the |zcor| argument. This vector can be created using the |fixed2Zcor()| function: @ <<>>= zcor <- fixed2Zcor(cor.fixed, id=simdatPerm$idvar, waves=simdatPerm$timeorder) zcor @ %def Notice that |zcor| contains correlations between measurements within the same cluster. Hence if a cluster contains only one observation, then there will be generated no entry in |zcor| for that cluster. Now we can fit the model with: @ <<>>= mod4 <- geeglm(yvar~tvar, id=idvar, data=simdatPerm, corstr="fixed", zcor=zcor) mod4 @ %def \end{document}