Skip to main content
Log in

Modeling Rule-Based Item Generation

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

An application of a hierarchical IRT model for items in families generated through the application of different combinations of design rules is discussed. Within the families, the items are assumed to differ only in surface features. The parameters of the model are estimated in a Bayesian framework, using a data-augmented Gibbs sampler. An obvious application of the model is computerized algorithmic item generation. Such algorithms have the potential to increase the cost-effectiveness of item generation as well as the flexibility of item administration. The model is applied to data from a non-verbal intelligence test created using design rules. In addition, results from a simulation study conducted to evaluate parameter recovery are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Albert, J.H. (1992). Bayesian estimation of normal-ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17, 261–269.

    Article  Google Scholar 

  • Béguin, A.A., & Glas, C.A.W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541–562.

    Article  Google Scholar 

  • Bormuth, J.R. (1970). On the theory of achievement test items. Chicago: University of Chicago Press.

    Google Scholar 

  • Cho, S.-J., & Rabe-Hesketh, S. (2011). Alternating imputation posterior estimation of models with crossed random effects. Computational Statistics and Data Analysis, 55, 12–25.

    Article  Google Scholar 

  • De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559.

    Article  Google Scholar 

  • De Boeck, P., & Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach. New York: Springer.

    Google Scholar 

  • Embretson, S.E. (1999). Generating items during testing: psychometric issues and models. Psychometrika, 64, 407–433.

    Article  Google Scholar 

  • Fischer, G.H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.

    Article  Google Scholar 

  • Fox, J.-P. (2004). Multilevel IRT model assessment. In L.A. van der Ark, M.A. Croon, & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences (pp. 227–252). London: Lawrence Erlbaum Associates.

    Google Scholar 

  • Fox, J.-P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 271–288.

    Article  Google Scholar 

  • Freund, P.A., Hofer, S., & Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195–210.

    Article  Google Scholar 

  • Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (2004). Bayesian data analysis. New York: Chapman & Hall.

    Google Scholar 

  • Gelman, A., & Pardoe, I. (2006). Bayesian measures of explained variance and pooling in multilevel (hierarchical) models. Technometrics, 48, 241–251.

    Article  Google Scholar 

  • Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In J.M. Bernardo, J. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics 4: proceedings of the fourth Valencia international meeting (pp. 169–193). Oxford: Oxford University Press.

    Google Scholar 

  • Glas, C.A.W. (2010). Item parameter estimation and item fit analysis. In W.J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 269–288). New York: Springer.

    Google Scholar 

  • Glas, C.A.W., & van der Linden, W.J. (2001). Modeling variability in item parameters in item response models (Research Report 01-11). Enschede, The Netherlands: Department of Educational Measurement and Data Analysis, University of Twente.

  • Glas, C.A.W., & van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261.

    Article  Google Scholar 

  • Glas, C.A.W., van der Linden, W.J., & Geerlings, H. (2010). Estimation of the parameters in an item-cloning model for adaptive testing. In W.J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 289–314). New York: Springer.

    Google Scholar 

  • Griffiths, W.E., & Valenzuela, M.R. (2006). Gibbs samplers for a set of seemingly unrelated regressions. Australian and New Zealand Journal of Statistics, 48, 335–351.

    Article  Google Scholar 

  • Heidelberger, P., & Welch, P.D. (1983). Simulation run length control in the presence of an initial transient. Operations Research, 31, 1109–1144.

    Article  Google Scholar 

  • Hively, W., Patterson, H.L., & Page, S.H. (1968). A “universe-defined” system of arithmetic achievement items. Journal of Educational Measurement, 5, 275–290.

    Article  Google Scholar 

  • Holling, H., Bertling, J.P., & Zeuch, N. (2009). Automatic item generation of probability word problems. Studies in Educational Evaluation, 35, 71–76.

    Article  Google Scholar 

  • Irvine, S.H., (2002). The foundations of item generation for mass testing. In S.H. Irvine & P.C. Kyllonen (Eds.) Item generation for test development (pp. 3–34). Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189–212). New York: Springer.

    Google Scholar 

  • Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306.

    Google Scholar 

  • Johnson, V.E., & Albert, J.H. (1999). Ordinal data modeling. New York: Springer.

    Google Scholar 

  • Laros, J.A., & Tellegen, P.J. (1991). Construction and validation of the SON-R 5,5-17, the Snijders-Oomen non-verbal intelligence test. Groningen: Wolters-Noordhoff.

    Google Scholar 

  • Luecht, R.M. Adaptive computer-based tasks under an assessment engineering paradigm. Paper presented at the 2009 Graduate Management Admission Council Conference on Computerized Adaptive Testing, Minneapolis, Minnesota.

  • MacEachern, S.N., & Berliner, L.M. (1994). Subsampling the Gibbs sampler. The American Statistician, 48, 188–190.

    Article  Google Scholar 

  • Millman, J., & Westman, R.S. (1989). Computer-assisted writing of achievement test items: toward a future technology. Journal of Educational Measurement, 26, 177–190.

    Article  Google Scholar 

  • Mislevy, R.J., & Levy, R. (2007). Bayesian psychometric modeling from an evidence-centered design perspective. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 839–865). Amsterdam: Elsevier.

    Google Scholar 

  • Osburn, H.G. (1968). Item sampling for achievement testing. Educational and Psychological Measurement, 28, 95–104.

    Article  Google Scholar 

  • Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6, 7–11. Available from http://CRAN.R-project.org/doc/Rnews/.

  • R Development Core Team (2009). R: A language and environment for statistical computing. Computer software manual. Vienna, Austria. Available from http://www.R-project.org.

  • Raftery, A.E., & Lewis, S. (1992). How many iterations in the Gibbs sampler? In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics 4: proceedings of the fourth Valencia international meeting (pp. 763–773). Oxford: Oxford University Press.

    Google Scholar 

  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.

    Google Scholar 

  • Rijmen, F., & De Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271–285.

    Article  Google Scholar 

  • Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8, 185–205.

    Article  PubMed  Google Scholar 

  • Roid, G., & Haladyna, T. (1982). A technology for test-item writing. New York: Academic Press.

    Google Scholar 

  • Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581–592.

    Article  Google Scholar 

  • Sinharay, S., Johnson, M.S., & Williamson, D.M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313.

    Article  Google Scholar 

  • Spiegelhalter, D.J., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society B, 64, 583–639.

    Article  Google Scholar 

  • Tanner, M.A. (1996). Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions. New York: Springer.

    Google Scholar 

  • Tellegen, P.J., & Laros, J.A. (1993). The construction and validation of a nonverbal test of intelligence: the revision of the Snijders-Oomen tests. European Journal of Psychological Assessment, 9, 147–157.

    Google Scholar 

  • van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369–386.

    Article  Google Scholar 

  • van der Linden, W.J., & Glas, C.A.W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13, 35–53.

    Article  Google Scholar 

  • Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests of aggregation bias. Journal of the American Statistical Association, 57, 348–368.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanneke Geerlings.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Geerlings, H., Glas, C.A.W. & van der Linden, W.J. Modeling Rule-Based Item Generation. Psychometrika 76, 337–359 (2011). https://doi.org/10.1007/s11336-011-9204-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-011-9204-x

Keywords

Navigation