jrt
This package provides user-friendly functions designed for the easy implementation of Item-Response Theory (IRT) models and scoring with judgment data. Although it can be used in a variety of contexts, the original motivation for implementation is to facilitate use for creativity researchers.
jrt
is not an estimation package, it provides wrapper
functions that call estimation packages and extract/report/plot
information from them. At this stage, jrt
uses the
(excellent) package mirt
(Chalmers, 2012) as its only IRT
engine. Thus, if you use jrt
for your research, please
ensure to cite mirt
as the estimation package/engine:
We also encourage that you cite jrt
– especially if you
use the plots or the automatic model selection. Currently, this would be
done with:
Ok now let’s get started…
Then, a judgment data.frame
would be provided to the
function jrt
. Here we’ll use the simulated one in
jrt::ratings
.
<- jrt::ratings data
It looks like this:
head(data)
#> Judge_1 Judge_2 Judge_3 Judge_4 Judge_5 Judge_6
#> 1 5 4 3 4 4 4
#> 2 3 3 2 3 2 2
#> 3 3 3 3 3 3 2
#> 4 3 2 2 3 4 2
#> 5 2 3 1 2 2 1
#> 6 3 2 2 3 2 1
jrt
is in development and these features will hopefully
appear soon (check back !), but in this release:
I know, that’s a lot that you can’t do…but this covers the typical cases, at least for the Consensual Assessment Technique – which is why it was originally created.
jrt()
You will first want to first load the library.
library(jrt)
#> Loading required package: directlabels
The main function of the jrt
package is
jrt()
. By default, this function will:
@factor.scores
(or @output.data
) slot of the
jrt
object.Let’s do it!
fit
) to do more after.
Note: There’s a progress bar by default, but it takes space in the
vignette, so I’ll remove it here with
progress.bar = F
.<- jrt(data, progress.bar = F)
fit #> The possible responses detected are: 1-2-3-4-5
#>
#> -== Model Selection (6 judges) ==-
#> AIC for Rating Scale Model: 4414.163 | Model weight: 0.000
#> AIC for Generalized Rating Scale Model: 4368.776 | Model weight: 0.000
#> AIC for Partial Credit Model: 4022.956 | Model weight: 0.000
#> AIC for Generalized Partial Credit Model: 4014.652 | Model weight: 0.000
#> AIC for Constrained Graded Rating Scale Model: 4399.791 | Model weight: 0.000
#> AIC for Graded Rating Scale Model: 4307.955 | Model weight: 0.000
#> AIC for Constrained Graded Response Model: 3999.248 | Model weight: 0.673
#> AIC for Graded Response Model: 4000.689 | Model weight: 0.327
#> -> The best fitting model is the Constrained Graded Response Model.
#>
#> -== General Summary ==-
#> - 6 Judges
#> - 300 Products
#> - 5 response categories (1-2-3-4-5)
#> - Mean judgment = 2.977 | SD = 0.862
#>
#> -== IRT Summary ==-
#> - Model: Constrained (equal slopes) Graded Response Model (Samejima, 1969) | doi: 10.1007/BF03372160
#> - Estimation package: mirt (Chalmers, 2012) | doi: 10.18637/jss.v048.i06
#> - Estimation algorithm: Expectation-Maximization (EM; Bock & Atkin, 1981) | doi: 10.1007/BF02293801
#> - Factor scoring method: Expected A Posteriori (EAP)
#> - AIC = 3999.248 | BIC = 4091.843 | SABIC = 4091.843 | HQ = 4036.305
#>
#> -== Model-based reliability ==-
#> - Empirical reliability | Average in the sample: .893
#> - Expected reliability | Assumes a Normal(0,1) prior density: .894
Of course there’s more available here than one would report. If using IRT scoring (which is the main purpose of this package), we recommend reporting what IRT model was selected, along with IRT indices primarily, since the scoring is based on the estimation of the \(\theta\) abilities. In this case typically what is reported in the empirical reliability (here 0.893), which is the estimate of the reliability of the observations in the sample. It can be interpreted similarily as other more traditionnal indices of reliability (like Cronbach’s \(\alpha\)).
<- jrt(data, silent = T) fit
One may of course select a model based on assumptions on the data
rather than on model fit comparisons. This is done through using the
name of a model as an imput of the argument irt.model
of
the jrt()
function. This bypasses the automatic model
selection stage.
<- jrt(data, "PCM")
fit #> The possible responses detected are: 1-2-3-4-5
#>
#> -== General Summary ==-
#> - 6 Judges
#> - 300 Products
#> - 5 response categories (1-2-3-4-5)
#> - Mean judgment = 2.977 | SD = 0.862
#>
#> -== IRT Summary ==-
#> - Model: Partial Credit Model (Masters, 1982) | doi: 10.1007/BF02296272
#> - Estimation package: mirt (Chalmers, 2012) | doi: 10.18637/jss.v048.i06
#> - Estimation algorithm: Expectation-Maximization (EM; Bock & Atkin, 1981) | doi: 10.1007/BF02293801
#> - Factor scoring method: Expected A Posteriori (EAP)
#> - AIC = 4022.956 | BIC = 4115.55 | SABIC = 4115.55 | HQ = 4060.012
#>
#> -== Model-based reliability ==-
#> - Empirical reliability | Average in the sample: .889
#> - Expected reliability | Assumes a Normal(0,1) prior density: .759
See the documentation for a list of available models. Most models are
directly those of mirt
. Others are versions of the Graded
Response Model or Generalized Partial Credit Model that are constrained
in various ways (equal discriminations and/or equal category structures)
through the mirt.model()
function of mirt
.
Note that they can also be called by their full names
(e.g. jrt(data, "Graded Response Model")
).
@factor.scores
.head(fit@factor.scores)
#> Judgments.Factor.Score Judgments.Standard.Error Judgments.Mean.Score
#> 1 1.7075935 0.5824540 4.000000
#> 2 -0.7213210 0.5581823 2.500000
#> 3 -0.1527368 0.5119554 2.833333
#> 4 -0.4246422 0.5319891 2.666667
#> 5 -2.2557844 0.6720457 1.833333
#> 6 -1.4155178 0.6202796 2.166667
Note : If you want a more complete output with the original data, use
@output.data
. If there were missing data,
@output.data
also appends imputed data.
head(fit@output.data)
#> Judge_1 Judge_2 Judge_3 Judge_4 Judge_5 Judge_6 Judgments.Factor.Score
#> 1 5 4 3 4 4 4 1.7075935
#> 2 3 3 2 3 2 2 -0.7213210
#> 3 3 3 3 3 3 2 -0.1527368
#> 4 3 2 2 3 4 2 -0.4246422
#> 5 2 3 1 2 2 1 -2.2557844
#> 6 3 2 2 3 2 1 -1.4155178
#> Judgments.Standard.Error Judgments.Mean.Score
#> 1 0.5824540 4.000000
#> 2 0.5581823 2.500000
#> 3 0.5119554 2.833333
#> 4 0.5319891 2.666667
#> 5 0.6720457 1.833333
#> 6 0.6202796 2.166667
Judge characteristics can be inspected with Judge Category Curve
(JCC) plots. They are computed with the function
jcc.plot()
.
A basic example for Judge 3…
jcc.plot(fit, judge = 3)
Now of course, there are many options, but a few things that you could try:
judge = "all"
or simply removing the judge
argument (note that you can change the number of columns or rows, see
the documentation for these advanced options).jcc.plot(fit)
jcc.plot(fit, judge = c(1,6))
jcc.plot(fit, facet.cols = 2)
greyscale = TRUE
(this uses linetypes instead of
colors)…jcc.plot(fit, 1, greyscale = T)