| Title: | Tools for Multiple Imputation of Missing Data |
|---|---|
| Description: | Tools to perform analyses and combine results from multiple-imputation datasets. |
| Authors: | Thomas Lumley |
| Maintainer: | Thomas Lumley <[email protected]> |
| License: | GPL-2 |
| Version: | 2.4 |
| Built: | 2026-05-27 09:29:18 UTC |
| Source: | https://github.com/cran/mitools |
Create and update imputationList objects to be used as input to other
MI routines.
imputationList(datasets,...) ## Default S3 method: imputationList(datasets,...) ## S3 method for class 'character' imputationList(datasets,dbtype,dbname,...) ## S3 method for class 'imputationList' update(object,...) ## S3 method for class 'imputationList' rbind(...) ## S3 method for class 'imputationList' cbind(...)imputationList(datasets,...) ## Default S3 method: imputationList(datasets,...) ## S3 method for class 'character' imputationList(datasets,dbtype,dbname,...) ## S3 method for class 'imputationList' update(object,...) ## S3 method for class 'imputationList' rbind(...) ## S3 method for class 'imputationList' cbind(...)
datasets |
a list of data frames corresponding to the multiple imputations, or a list of names of database tables or views |
dbtype |
"ODBC" or a database driver name for
|
dbname |
Name of the database |
object |
An object of class |
... |
Arguments |
When the arguments to imputationList() are character strings a
database-based imputation list is created. This can be a database
accessed through ODBC with the RODBC package or a database with a
DBI-compatible driver. The dbname and ... arguments are
passed to dbConnect() or odbcConnect() to create a
database connection. Data are read from the database as needed.
For a database-backed object the update() method creates variable
definitions that are evaluated as the data are read, so that read-only
access to the database is sufficient.
An object of class imputationList or DBimputationList
## Not run: ## CRAN doesn't like this example data.dir <- system.file("dta",package="mitools") files.men <- list.files(data.dir,pattern="m.\\.dta$",full=TRUE) men <- imputationList(lapply(files.men, foreign::read.dta)) files.women <- list.files(data.dir,pattern="f.\\.dta$",full=TRUE) women <- imputationList(lapply(files.women, foreign::read.dta)) men <- update(men, sex=1) women <- update(women,sex=0) all <- rbind(men,women) all <- update(all, drinkreg=as.numeric(drkfre)>2) all ## End(Not run)## Not run: ## CRAN doesn't like this example data.dir <- system.file("dta",package="mitools") files.men <- list.files(data.dir,pattern="m.\\.dta$",full=TRUE) men <- imputationList(lapply(files.men, foreign::read.dta)) files.women <- list.files(data.dir,pattern="f.\\.dta$",full=TRUE) women <- imputationList(lapply(files.women, foreign::read.dta)) men <- update(men, sex=1) women <- update(women,sex=0) all <- rbind(men,women) all <- update(all, drinkreg=as.numeric(drkfre)>2) all ## End(Not run)
Combines results of analyses on multiply imputed data sets. A generic
function with methods for imputationResultList objects and a
default method. In addition to point estimates and variances,
MIcombine computes Rubin's degrees-of-freedom estimate and rate
of missing information.
MIcombine(results, ...) ## Default S3 method: MIcombine(results,variances,call=sys.call(),df.complete=Inf,...) ## S3 method for class 'imputationResultList' MIcombine(results,call=NULL,df.complete=Inf,...)MIcombine(results, ...) ## Default S3 method: MIcombine(results,variances,call=sys.call(),df.complete=Inf,...) ## S3 method for class 'imputationResultList' MIcombine(results,call=NULL,df.complete=Inf,...)
results |
A list of results from inference on separate imputed datasets |
variances |
If |
call |
A function call for labelling the results |
df.complete |
Complete-data degrees of freedom |
... |
Other arguments, not used |
The
results argument in the default method may be either a list of
parameter vectors or a list of objects that have coef and
vcov methods. In the former case a list of variance-covariance
matrices must be supplied as the second argument.
The complete-data degrees of freedom are used when a complete-data analysis would use a t-distribution rather than a Normal distribution for confidence intervals, such as some survey applications.
An object of class MIresult with summary and
print methods
~put references to the literature/web site here ~
MIextract, with.imputationList
data(smi) models<-with(smi, glm(drinkreg~wave*sex,family=binomial())) summary(MIcombine(models)) betas<-MIextract(models,fun=coef) vars<-MIextract(models, fun=vcov) summary(MIcombine(betas,vars))data(smi) models<-with(smi, glm(drinkreg~wave*sex,family=binomial())) summary(MIcombine(models)) betas<-MIextract(models,fun=coef) vars<-MIextract(models, fun=vcov) summary(MIcombine(betas,vars))
Used to extract parameter estimates and standard errors from
lists produced by with.imputationList.
MIextract(results, expr, fun)MIextract(results, expr, fun)
results |
A list of objects |
expr |
an expression |
fun |
a function of one argument |
If expr is supplied, it is evaluated in each element of
results. Otherwise each element of results is passed as
an argument to fun.
A list
with.imputationList, MIcombine
data(smi) models<-with(smi, glm(drinkreg~wave*sex,family=binomial())) betas<-MIextract(models,fun=coef) vars<-MIextract(models, fun=vcov) summary(MIcombine(betas,vars))data(smi) models<-with(smi, glm(drinkreg~wave*sex,family=binomial())) betas<-MIextract(models,fun=coef) vars<-MIextract(models, fun=vcov) summary(MIcombine(betas,vars))
Data on maths performance, gender, some problem-solving variables and some school resource variables. This is actually a weighted survey: see withPV.survey.design in the survey package for a better analyis.
data("pisamaths")data("pisamaths")
A data frame with 4291 observations on the following 26 variables.
SCHOOLIDSchool ID
CNTCountry id: a factor with levels New Zealand
STRATUMa factor with levels NZL0101 NZL0102 NZL0202 NZL0203
OECDIs the country in the OECD?
STIDSTDStudent ID
ST04Q01Gender: a factor with levels Female Male
ST14Q02Mother has university qualifications No Yes
ST18Q02Father has university qualifications No Yes
MATHEFFMathematics Self-Efficacy: numeric vector
OPENPSMathematics Self-Efficacy: numeric vector
PV1MATH,PV2MATH,PV3MATH,PV4MATH,PV5MATH 'Plausible values' (multiple imputations) for maths performance
W_FSTUWTDesign weight for student
SC35Q02Proportion of maths teachers with professional development in maths in past year
PCGIRLSProportion of girls at the school
PROPMA5AProportion of maths teachers with ISCED 5A (math major)
ABGMATHDoes the school group maths students: a factor with levels No ability grouping between any classes One of these forms of ability grouping between classes for s One of these forms of ability grouping for all classes
SMRATIONumber of students per maths teacher
W_FSCHWTDesign weight for school
condwtDesign weight for student given school
A subset extracted from the PISA2012lite R package, https://github.com/pbiecek/PISA2012lite
OECD (2013) PISA 2012 Assessment and Analytical Framework: Mathematics, Reading, Science, Problem Solving and Financial Literacy. OECD Publishing.
data(pisamaths) means<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH), data=pisamaths, action= quote(by(maths, ST04Q01, mean)), rewrite=TRUE) means models<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH), data=pisamaths, action= quote(lm(maths~ST04Q01*PCGIRLS)), rewrite=TRUE) summary(MIcombine(models))data(pisamaths) means<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH), data=pisamaths, action= quote(by(maths, ST04Q01, mean)), rewrite=TRUE) means models<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH), data=pisamaths, action= quote(lm(maths~ST04Q01*PCGIRLS)), rewrite=TRUE) summary(MIcombine(models))
An imputationList object containing five imputations of data
from the Victorian Adolescent Health Cohort Study.
data(smi)data(smi)
The underlying data are in a data frame with 1170 observations on the following 12 variables.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a factor with levels Non drinker not in last wk <3 days last wk >=3 days last wk
a factor with levels Non drinker not in last wk av <5units/drink_day av =>5units/drink_day
a numeric vector
a factor with levels non/ex-smoker <6 days 6/7 days
a numeric vector
a numeric vector
a numeric vector
a logical vector
Carlin, JB, Li, N, Greenwood, P, Coffey, C. (2003) "Tools for analysing multiple imputed datasets" The Stata Journal 3; 3: 1-20.
data(smi) with(smi, table(sex, drkfre)) model1<-with(smi, glm(drinkreg~wave*sex, family=binomial())) MIcombine(model1) summary(MIcombine(model1))data(smi) with(smi, table(sex, drkfre)) model1<-with(smi, glm(drinkreg~wave*sex, family=binomial())) MIcombine(model1) summary(MIcombine(model1))
Performs a computation of each of imputed datasets in data
## S3 method for class 'imputationList' with(data, expr, fun, ...)## S3 method for class 'imputationList' with(data, expr, fun, ...)
data |
An |
expr |
An expression |
fun |
A function taking a data frame argument |
... |
Other arguments, passed to |
If expr is supplied, evaluate it in each dataset in data;
if fun is supplied, it is evaluated on each dataset. If all the
results inherit from "imputationResult" the return value is an
imputationResultList object, otherwise it is an ordinary list.
Either a list or an imputationResultList object
data(smi) models<-with(smi, glm(drinkreg~wave*sex,family=binomial())) tables<-with(smi, table(drkfre,sex)) with(smi, fun=summary)data(smi) models<-with(smi, glm(drinkreg~wave*sex,family=binomial())) tables<-with(smi, table(drkfre,sex)) with(smi, fun=summary)
Repeats an analysis for each of a set of 'plausible values' in a data
set, returning a list suitable for MIcombine. That is, the data
set contains some sets of columns where each set are multiple
imputations of the same variable. With
rewrite=TRUE, the action is rewritten to reference each
plausible value in turn; with coderewrite=FALSE a new data set is
constructed for each plausible value, which is slower but more general.
withPV(mapping, data, action, rewrite=TRUE, ...) ## Default S3 method: withPV(mapping, data, action, rewrite=TRUE,...)withPV(mapping, data, action, rewrite=TRUE, ...) ## Default S3 method: withPV(mapping, data, action, rewrite=TRUE,...)
mapping |
A formula or list of formulas describing each variable in the analysis that has plausible values. The left-hand side of the formula is the name to use in the analysis; the right-hand side gives the names in the dataset. |
data |
A data frame. Methods for |
action |
With |
rewrite |
Rewrite |
... |
For methods |
A list of the results returned by each evaluation of action, with the call as an attribute.
I would be interested in seeing naturally-occurring examples where
rewrite=TRUE does not work
data(pisamaths) models<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH), data=pisamaths, action= quote(lm(maths~ ST04Q01*(PCGIRLS+SMRATIO)+MATHEFF+OPENPS, data=.DATA)), rewrite=FALSE ) summary(MIcombine(models)) ## equivalently models2<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH), data=pisamaths, action=quote( lm(maths~ST04Q01*(PCGIRLS+SMRATIO)+MATHEFF+OPENPS)), rewrite=TRUE) summary(MIcombine(models2))data(pisamaths) models<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH), data=pisamaths, action= quote(lm(maths~ ST04Q01*(PCGIRLS+SMRATIO)+MATHEFF+OPENPS, data=.DATA)), rewrite=FALSE ) summary(MIcombine(models)) ## equivalently models2<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH), data=pisamaths, action=quote( lm(maths~ST04Q01*(PCGIRLS+SMRATIO)+MATHEFF+OPENPS)), rewrite=TRUE) summary(MIcombine(models2))