| Title: | Regression Functions for Two-phase Response-selective Sampled Data |
|---|---|
| Description: | Performs a variety of regression analyses using semiparametric maximum likelihood for data subject to response selection and two-stage missingness. |
| Authors: | Chris Wild <[email protected]>, with contributions from Yannan Jiang <[email protected]>. |
| Maintainer: | Thomas Lumley <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 3.1-1 |
| Built: | 2026-06-26 02:45:29 UTC |
| Source: | https://github.com/tslumley/missreg3 |
Fits binary regression models to data with the two-phase missingness structure. This class includes stratified case-control data.
bin2stg(formula, weights = NULL, xstrata = NULL, obstype.name = "obstype", data, fit = TRUE, xs.includes = FALSE, linkname = "logit", start = NULL, Qstart = NULL, int.rescale = TRUE, off.set = NULL, control = mlefn.control(...), control.inner = mlefn.control.inner(...), ...)bin2stg(formula, weights = NULL, xstrata = NULL, obstype.name = "obstype", data, fit = TRUE, xs.includes = FALSE, linkname = "logit", start = NULL, Qstart = NULL, int.rescale = TRUE, off.set = NULL, control = mlefn.control(...), control.inner = mlefn.control.inner(...), ...)
formula |
A symbolic description of the model to be fitted. If there is only one non-NA level of the response variable presented in the data, that level is treated as "failure" (control). |
weights |
An optional vector of weights to be used in the fitting process.
Should be |
xstrata |
Specify names of the stratification variables to be used, e.g.
|
obstype.name |
Name of the variable specifying labels for observations by sampling
and variable type: |
data |
A data frame containing all the variables required for analysis,
including those for |
fit |
If |
xs.includes |
|
linkname |
A specification for the model link function. Three choices are provided:
|
start |
Starting values for the regression parameters. Can be compusory if the program
cannot produce a valid starting value at some situations.
|
Qstart |
An optional starting matrix for Pr(Y=i|Xstratum=j). The first row should be related to the successes (cases) and the second to the failures (controls). Can be compulsory if the program cannot produce a valid starting value at some situations. |
int.rescale |
If |
off.set |
Specify an |
control |
Specify control parameters for the iterations in |
control.inner |
Specify control parameters for inner iterations nested within
|
... |
Further arguments passed to or from related functions. |
This function fits binary regression models using several links with various types
of observations collected at different two-phase sampling schemes. More detailed
descriptions of the function and its applications can be found in "Description
of the missreg Library" (Wild and Jiang).
missReport |
Matrix containing information on deleted records with missing observations. |
StrReport |
Cross tabulation of counts for different levels of |
xStrReport |
Cross tabulation of counts for |
key |
Specify detailed classification for each of the X-strata. |
yKey |
Specify the Y variable and its level that the model is constructed for. |
fit |
|
error |
The error messages returned by |
coefficients |
The coefficients matrix with estimates, standard errors, z values and associated p-values. |
loglk |
Log-likelihood returned from final |
score |
Score vector returned from final |
inf |
Observed information matrix returned from final |
fitted |
The fitted values of Y obtained by transforming the linear predictors by the inverse of the link function. |
cov |
The asymptotic covariance matrix (inverse of the informnation matrix). |
cor |
The asymptotic correlation matrix. |
Qmat |
The estimated Pr(Y=i|Xstratum=j) from the last iteration. |
The function summary.bin2stg provides a complete summary of
the regression results including the Wald tests and a regression panel.
All related output functions (print.bin2stg, summary.bin2stg
and print.summary.bin2stg) don't have help files provided at the moment.
Chris Wild, Yannan Jiang
Description of the missreg Library, Wild and Jiang, 2007.
data(leprosy1) leprosy1$age.trans <- 100 * (leprosy1$age + 7.5)^-2 z1 <- bin2stg(leprosy ~ age.trans + scar, data=leprosy1, weights=counts, xstrata="age", xs.includes=TRUE) summary(z1) data(leprosy2) leprosy2$age.trans <- 100 * (leprosy2$age + 7.5)^-2 z2 <- bin2stg(cbind(case,control) ~ age.trans + scar, data=leprosy2, xstrata="age", xs.includes=TRUE) summary(z2) data(leprosy3) leprosy3$age.trans <- 100 * (leprosy3$age + 7.5)^-2 z3 <- bin2stg(leprosy ~ age.trans + scar, data=leprosy3, weights=counts, xs.includes=TRUE) summary(z3) data(wilms.sub) z4 <- bin2stg(cbind(case,control) ~ stage*hist, xstrata=c("stage","inst"), xs.includes=TRUE, data=wilms.sub) summary(z4) data(trawl) attach(trawl) # 265 out of 787 fish in fine net have length over 35 (caught37=NA) # 353 out of 738 fish in test net have length over 35 (caught37=1) # So 738 were caught from (estimate) 353*787/265 that entered #est. pr(caught) assuming all fish over len=35 are caught phat <- 738 / (787*353/265) z5 <- bin2stg(caught37 ~ I(length-35), weights=count, data=trawl, start=c(log(phat/(1-phat)),0), Qstart=matrix(c(phat,1-phat))) summary(z5) data(lowbirth.bin) z6 <- bin2stg(sgagp~mumht+bmi+I(bmi^2) + ethnicdb + factor(occ)+ hyper + smoke, weights=counts, xstrata=c("ethnicdb","smokedb"), obstype.name=c("instudy"), data=lowbirth.bin, xs.includes=FALSE) summary(z6)data(leprosy1) leprosy1$age.trans <- 100 * (leprosy1$age + 7.5)^-2 z1 <- bin2stg(leprosy ~ age.trans + scar, data=leprosy1, weights=counts, xstrata="age", xs.includes=TRUE) summary(z1) data(leprosy2) leprosy2$age.trans <- 100 * (leprosy2$age + 7.5)^-2 z2 <- bin2stg(cbind(case,control) ~ age.trans + scar, data=leprosy2, xstrata="age", xs.includes=TRUE) summary(z2) data(leprosy3) leprosy3$age.trans <- 100 * (leprosy3$age + 7.5)^-2 z3 <- bin2stg(leprosy ~ age.trans + scar, data=leprosy3, weights=counts, xs.includes=TRUE) summary(z3) data(wilms.sub) z4 <- bin2stg(cbind(case,control) ~ stage*hist, xstrata=c("stage","inst"), xs.includes=TRUE, data=wilms.sub) summary(z4) data(trawl) attach(trawl) # 265 out of 787 fish in fine net have length over 35 (caught37=NA) # 353 out of 738 fish in test net have length over 35 (caught37=1) # So 738 were caught from (estimate) 353*787/265 that entered #est. pr(caught) assuming all fish over len=35 are caught phat <- 738 / (787*353/265) z5 <- bin2stg(caught37 ~ I(length-35), weights=count, data=trawl, start=c(log(phat/(1-phat)),0), Qstart=matrix(c(phat,1-phat))) summary(z5) data(lowbirth.bin) z6 <- bin2stg(sgagp~mumht+bmi+I(bmi^2) + ethnicdb + factor(occ)+ hyper + smoke, weights=counts, xstrata=c("ethnicdb","smokedb"), obstype.name=c("instudy"), data=lowbirth.bin, xs.includes=FALSE) summary(z6)
Fits bivariate binary regression models to data with two correlated binary Y-variables and two-phase missingness structure.
bivbin2stg(formula1, formula2, formula3, weights = NULL, xstrata = NULL, obstype.name = "obstype", data, fit = TRUE, xs.includes = FALSE, y1samp = TRUE, method = "palmgren", start = NULL, Qstart = NULL, off.set = NULL, control = mlefn.control(...), control.inner = mlefn.control.inner(...), ...)bivbin2stg(formula1, formula2, formula3, weights = NULL, xstrata = NULL, obstype.name = "obstype", data, fit = TRUE, xs.includes = FALSE, y1samp = TRUE, method = "palmgren", start = NULL, Qstart = NULL, off.set = NULL, control = mlefn.control(...), control.inner = mlefn.control.inner(...), ...)
formula1 |
A symbolic description of the model to be fitted for Y1,
the binary response defining the case-control status of subjects.
When the |
formula2 |
A symbolic description of the model to be fitted for Y2, the second binary response of interest correlated with Y1. |
formula3 |
A symbolic description of the model to be fitted quantifying
the association between Y1 and Y2. |
weights |
An optional vector of weights to be used in the fitting process. Should
be |
xstrata |
Specify names of the stratification variables to be used, e.g.
|
obstype.name |
Name of the variables specifying labels for observations by sampling
and variable type: |
data |
A data frame containing all the variables required for analysis, including
those for |
fit |
If |
xs.includes |
|
y1samp |
|
method |
Four methods are implemented: |
start |
Starting values for the regression parameters in Y1-model, Y2-model and the association model when applicable. |
Qstart |
An optional starting matrix for Pr(Ystratum=i | Xstratum=j). Can be compulsory if the program cannot produce a valid starting value at some situations. |
off.set |
Specify an |
control |
Specify control parameters for inner iterations nested within |
control.inner |
Specify control parameters for inner iterations nested within |
... |
Further arguments passed to or from related functions. |
This function fits bivariate binary regresison to two correlated binary outcomes Y1 and
Y2 using several models with various types observations collected at differnt two-phase
sampling schemes.
The joint distribution of Y1 and Y2 given X can be modelled using the marginal
distributions of Pr(Y1|X) and Pr(Y2|X) along with an association model between Y1
and Y2. Currently implemented models for this approach include the Palmgren, Bahadur
and Copula models. When we are only interested in Pr(Y2|X), another semiparametric
approach (called spml2 method) can be used in terms of a conditional factorisation
Pr(Y1|Y2, X)*Pr(Y2|X) both treated parametrically.
More detailed descriptions of this function can be found in "Description of the
missreg Library" (Wild and Jiang).
missReport |
Matrix containing information on deleted records with missing observations. |
StrReport |
Cross tabulation of counts for different levels of |
xStrReport |
Cross tabulation of counts for |
key |
Specify detailed classification for each of the X-strata. |
ykey |
Specify the Y-variables that the model is being constructed for. |
fit |
|
error |
The error messages returned by |
coefficients |
The coefficients matrix with estimates, standard errors, z values and associated p-values. Will report separately for each marginal model used. |
loglk |
Log-likelihood returned from final |
score |
Score vector returned from final |
inf |
Observed information matrix returned from final |
fitted.Y2 |
The fitted values of Y2 obtained by transforming the linear predictors by the inverse of the link function. Note that all methods we have implemented evaluate Pr(Y2|X) which is normally the model of interest. |
cov |
The asymptotic covariance matrix (inverse of the information matrix). |
cor |
The asymptotic correlation matrix. |
Qmat |
The estimated Pr(Ystratum=i|Xstratum=j) from the last iteration. |
...
The function summary.bivbin2stg provides a complete summary of the regression
results including the Wald tests and a regression model. All related output functions
print.bivbin2stg, summary.bivbin2stg and print.summary.bivbin2stg
don't have help files provided at the moment.
Chris Wild, Yannan Jiang
Description of the missreg Library, Wild and Jiang, 2007.
### SAMPLING ON CASE-CONTROL INFORMATION OF Y1 ONLY ### data(cotdeath) z1 <- bivbin2stg(y1~x, y2~x, ~x, weights=wts, data=cotdeath, xs.includes=TRUE, method="palmgren") summary(z1) z2 <- bivbin2stg(y1~x, y2~x, ~x, weights=wts, data=cotdeath, xs.includes=TRUE, method="bahadur") summary(z2) z3 <- bivbin2stg(y1~x, y2~x, ~x, weights=wts, data=cotdeath, xs.includes=TRUE, method="copula") summary(z3) z4 <- bivbin2stg(y1~x*y2, y2~x, weights=wts, data=cotdeath, xs.includes=TRUE, method="spml2") summary(z4) data(infarct) z5 <- bivbin2stg(sgagp~ethnic+smoked+hyper+mumwt+mumwtc2+agepreg, anyinf~smoked+hyper+age1st, ~age1st, weights=count, xstrata=c("sex", "gest"), obstype.name="instudy", data=infarct, xs.includes=TRUE, method="palmgren") summary(z5) ### SAMPLING ON ALL CONTROLS (Y1=0 AND Y2=0) OR NOT ### data(dat00) z6 <- bivbin2stg(y1~x, y2~x, ~x, weights=wts, data=dat00, y1samp=FALSE, xstrata="v", xs.includes=FALSE, method="palmgren") summary(z6)### SAMPLING ON CASE-CONTROL INFORMATION OF Y1 ONLY ### data(cotdeath) z1 <- bivbin2stg(y1~x, y2~x, ~x, weights=wts, data=cotdeath, xs.includes=TRUE, method="palmgren") summary(z1) z2 <- bivbin2stg(y1~x, y2~x, ~x, weights=wts, data=cotdeath, xs.includes=TRUE, method="bahadur") summary(z2) z3 <- bivbin2stg(y1~x, y2~x, ~x, weights=wts, data=cotdeath, xs.includes=TRUE, method="copula") summary(z3) z4 <- bivbin2stg(y1~x*y2, y2~x, weights=wts, data=cotdeath, xs.includes=TRUE, method="spml2") summary(z4) data(infarct) z5 <- bivbin2stg(sgagp~ethnic+smoked+hyper+mumwt+mumwtc2+agepreg, anyinf~smoked+hyper+age1st, ~age1st, weights=count, xstrata=c("sex", "gest"), obstype.name="instudy", data=infarct, xs.includes=TRUE, method="palmgren") summary(z5) ### SAMPLING ON ALL CONTROLS (Y1=0 AND Y2=0) OR NOT ### data(dat00) z6 <- bivbin2stg(y1~x, y2~x, ~x, weights=wts, data=dat00, y1samp=FALSE, xstrata="v", xs.includes=FALSE, method="palmgren") summary(z6)
Fits bivariate binary-linear regression models to data with two associated response variables, binary Y1 and continuous Y2, and two-phase missingness structure.
bivlocsc2stg(formula1, formula2, formula3, weights = NULL, xstrata = NULL, data, obstype.name = "obstype", fit = TRUE, xs.includes = FALSE, off.set = NULL, errdistrn = "normal", errmodpars = 6, start = NULL, Qstart = NULL, control = mlefn.control(...), control.inner = mlefn.control.inner(...), ...)bivlocsc2stg(formula1, formula2, formula3, weights = NULL, xstrata = NULL, data, obstype.name = "obstype", fit = TRUE, xs.includes = FALSE, off.set = NULL, errdistrn = "normal", errmodpars = 6, start = NULL, Qstart = NULL, control = mlefn.control(...), control.inner = mlefn.control.inner(...), ...)
formula1 |
A symbolic description of the model to be fitted for Y1|Y2, where Y1 is the binary response defining the case-control status of subjects and Y2 is a continuous response of interest observed at the second phase. |
formula2 |
A symbolic description of the location model to be fitted for Y2. |
formula3 |
A symbolic description of the log-scale model to be fitted for Y2. |
weights |
An optional vector of weights to be used in the fitting process.
Should be |
xstrata |
Specify names of the stratification variables to be used, e.g.
|
obstype.name |
Name of the variable specifying labels for observations
by sampling and variable type: |
data |
The data |
fit |
If |
xs.includes |
|
off.set |
Specify an |
errdistrn |
A specification for the erro distribution. Three choices are provided:
standard logistic ( |
errmodpars |
Set parameter values for the error distribution. The default is 6 for student's-t distribution. |
start |
Starting values for the regression parameters. |
Qstart |
An optional starting matrix for Pr(Ystratum=i|Xstratum=j). |
control |
Specify control parameters for the iterations in |
control.inner |
Specify control parameters for inner iterations nested within
|
... |
Further arguments passed to or from related functions. |
This function extends the application of SPML2 method when Y2, the second response of interest
associated with Y1, is a continuous variable and ideal to be analysed under the location-scale model.
In particular, we use a logistic regression model for Y1|Y2 as in bivbin2stg when the
SPML2 method is applied, but a linear regression model for Y2 itself. Although the function
allows for different error distributions ("logistic", "normal", and "t" are
implemented so far), only the normal is assumed in the strata function and should be used at this stage.
missReport |
Matrix containing information on deleted records with missing observations. |
StrReport |
Cross tabulation of counts for different levels of |
xStrReport |
Cross tabulation of counts for |
key |
Specify detailed classification for each of the X-strata. |
ykey |
Vector containing the names of the Y-variables and the names of the level of Ys the model is being constructed for. The sequence is as (name of Y1, name of the level at Y1=1, name of Y2). |
fit |
|
error |
The error messages returned by |
coefficients |
The coefficients matrix with estimates, standard errors, z-values and associated p-values. |
loglk |
Log-likelihood returned from final |
score |
Score vector returned from final |
inf |
Observed information matrix returned from final |
fitted |
The fitted values of Y2 obtained from the model. |
cov |
The asymptotic covariance matrix (inverse of the information matrix). |
cor |
The asymptotic correlation matrix. |
Qmat |
The estimated Pr(Ystratum)=i|Xstratum=j) from the last iteration. |
The function summary.bivlocsc2stg gives a complete summary of the regression results
including the Wald tests and a regression panel. All related output functions (print.bivlocsc2stg,
summary.bivlocsc2stg and print.summary.bivlocsc2stg) don't have help files provided
at the moment.
Chris Wild, Yannan Jiang
Description of the missreg Library, Wild and Jiang, 2007.
## Data Generation ## N <- 5000 x <- rnorm(N) eps <- rnorm(N) theta2 <- c(0.5,1,0) y2 <- theta2[1]+theta2[2]*x+exp(theta2[3])*eps theta1 <- c(-3,-0.5,1,0.5) eta1 <- theta1[1]+theta1[2]*y2+theta1[3]*x+theta1[4]*y2*x p1 <- plogis(eta1) y1 <- 1*(runif(N)<p1) xcut <- c(-30,-1,0,1,30) xstrata <- as.numeric(cut(x,xcut)) indca <- (1:N)[y1==1] indct <- sample((1:N)[y1==0],length(indca)) ind <- sort(c(indca,indct)) rest <- (1:N)[-ind] obstype <- rep("retro",N) obstype[rest] <- "strata" y2[rest] <- NA; x[rest] <- NA dat <- data.frame(y1,y2,x,xstrata,obstype) ## Proportion of cases in population (about 0.1) ## prca <- length(indca)/N prca ## Model fit ## z <- bivlocsc2stg(y1~y2*x,y2~x,~1,xstrata="xstrata",data=dat,xs.includes=FALSE) summary(z)## Data Generation ## N <- 5000 x <- rnorm(N) eps <- rnorm(N) theta2 <- c(0.5,1,0) y2 <- theta2[1]+theta2[2]*x+exp(theta2[3])*eps theta1 <- c(-3,-0.5,1,0.5) eta1 <- theta1[1]+theta1[2]*y2+theta1[3]*x+theta1[4]*y2*x p1 <- plogis(eta1) y1 <- 1*(runif(N)<p1) xcut <- c(-30,-1,0,1,30) xstrata <- as.numeric(cut(x,xcut)) indca <- (1:N)[y1==1] indct <- sample((1:N)[y1==0],length(indca)) ind <- sort(c(indca,indct)) rest <- (1:N)[-ind] obstype <- rep("retro",N) obstype[rest] <- "strata" y2[rest] <- NA; x[rest] <- NA dat <- data.frame(y1,y2,x,xstrata,obstype) ## Proportion of cases in population (about 0.1) ## prca <- length(indca)/N prca ## Model fit ## z <- bivlocsc2stg(y1~y2*x,y2~x,~1,xstrata="xstrata",data=dat,xs.includes=FALSE) summary(z)
The leprosy data set was described in Scott and Wild (1997, 2001) and used as an example of standard two-phase case-control sampled data.
data(leprosy1)data(leprosy1)
A data frame with 42 observations on the following 5 variables.
leprosya factor with levels no yes
agea numeric vector indicating the mid-point of six 5-year age groups
scara factor with levels no yes
countsa numeric vector indicating number of subjects with each observation
obstypea factor with levels retro and strata, compulsory for function call
The leprosy data were obtained by sampling from the results of a population
cross-sectional study of people under 35 in Northern Malawi and represented as
a three-way contingency table in Clayton and Hills (1993). Those people with leprosy
were defined to be cases and the rest to be controls. The data were first categorized
into six 5-year age sampling strata and the numbers of cases and controls falling into
each stratum were observed. All cases have been chosen with an equal-sized control
group subsampled from the control population within each age stratum. The potential
risk factor that indicates the presence or absence of a BCG vaccination scar was then
observed.
The data are represented in three different formats in leprosy1,
leprosy2 and leprosy3. See "Description of the missreg
Library" for more details.
Description of the missreg Library, Wild and Jiang, 2007
data(leprosy1)data(leprosy1)
Fit location-scale model of the form Y = eta + sigma*error to data with a single continuous Y-variable and two-phase missingness structure, and convert to binary-logistic parameters and odds-ratio estimates with appropriate cut-points of Y.
linbin2stg(formula1, yCuts, lower.tail = TRUE, weights = NULL, xstrata = NULL, data = list(), obstype.name = "obstype", fit = TRUE, xs.includes = FALSE, compactX = FALSE, start = NULL, Qstart = NULL, deltastart = NULL, int.rescale = TRUE, control = mlefn.control(...), control.inner = mlefn.control.inner(...), ...)linbin2stg(formula1, yCuts, lower.tail = TRUE, weights = NULL, xstrata = NULL, data = list(), obstype.name = "obstype", fit = TRUE, xs.includes = FALSE, compactX = FALSE, start = NULL, Qstart = NULL, deltastart = NULL, int.rescale = TRUE, control = mlefn.control(...), control.inner = mlefn.control.inner(...), ...)
formula1 |
A symbolic description of the location model to be fitted, i.e. eta. |
yCuts |
Cutpoint(s) used to define the binary Y-variable for logistic regression. Can be a matrix form (1*S) with S the number of xstrata. |
lower.tail |
If TRUE, define the cases being |
weights |
An optional vector of weights to be used in the fitting process. Should be |
xstrata |
Specify names of the stratification variables to be used, e.g. |
data |
A data frame containing all the variables required for analysis, including those for |
obstype.name |
Name of the variable specifying labels for observations by sampling and variable type: |
fit |
If |
xs.includes |
|
compactX |
If |
start |
Starting values for the regression parameters. Can be compusory if the program cannot produce a valid starting value at some situations. |
Qstart |
An optional starting matrix for Pr(Ystratum=i|Xstratum=j). Can be compulsory if the program cannot produce a valid starting value at some situations. |
deltastart |
An optional starting matrix for Pr(X=xk|Xstratum=j). |
int.rescale |
If |
control |
Specify control parameters for the iterations in |
control.inner |
Specify control parameters for inner iterations nested within |
... |
Further arguments passed to or from related functions. |
This function is a simple application of locsc2stg fitting linear regression models with a continuous Y using logistic error distribution. The results are then converted to much more efficient inferences about the same odds-ratio parameters being estimated by the logistic regression with the dichotomized binary outcome (case-control).
More detailed descriptions of this function can be found in
"Description of the missreg Library" (Wild and Jiang).
missReport |
Matrix containing information on deleted records with missing observations. |
StrReport |
Cross tabulation of counts for different levels of |
xStrReport |
Cross tabulation of counts for |
key |
Specify detailed classification for each of the X-strata. |
yCutsKey |
Specify the cutoff intervals for defined Y-strata within each X-stratum. |
fit |
|
error |
The error messages returned by |
coefficients |
Linear regression coefficients. |
loglk |
Log-likelihood returned from final |
score |
Score vector returned from final |
inf |
Observed information matrix returned from final |
fitted |
The fitted values of Y obtained from the model. |
cov |
The asymptotic covariance matrix (inverse of the informnation matrix) of linear parameter estimates. |
cor |
The asymptotic correlation matrix of linear parameter estimates. |
bcoefficients |
Binary regression coefficients converted from linear parameters. |
bcov |
The asymptotic variance of binary parameter estimates. |
The function summary.linbin2stg provides a complete summary of the regression results including the Wald tests and a regression panel for linear coefficients, a regression panel for binary coefficients, and associated odds-ratio estimates and confidence intervals. All related output functions (print.linbin2stg, summary.linbin2stg and print.summary.linbin2stg) don't have
help files provided at the moment.
Also note that the intercept of binary coefficients will not be available when more than one cut-point of Y is used, e.g. different for each x-stratum.
Chris Wild, Yannan Jiang
Description of the missreg Library, Wild and Jiang, 2007.
data(lowbirth.ls) lowbirth.ls$sex.age <- interaction(lowbirth.ls$sex,lowbirth.ls$gest) yCuts <- matrix(c(2550,2650,2740,2840,2900,3010,3030,3140),nrow=1) yCut1 <- mean(yCuts) ### Multiple yCuts; z1 <- linbin2stg(birthwt~gest+mumht+bmi+ethnicdb+hyper+smoke, yCuts=yCuts, xstrata=c("sex.age"), data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE) summary(z1) ### Single yCut; z2 <- linbin2stg(birthwt~gest+mumht+bmi+ethnicdb+hyper+smoke, yCuts=yCut1, xstrata=c("sex.age"), data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE) summary(z2)data(lowbirth.ls) lowbirth.ls$sex.age <- interaction(lowbirth.ls$sex,lowbirth.ls$gest) yCuts <- matrix(c(2550,2650,2740,2840,2900,3010,3030,3140),nrow=1) yCut1 <- mean(yCuts) ### Multiple yCuts; z1 <- linbin2stg(birthwt~gest+mumht+bmi+ethnicdb+hyper+smoke, yCuts=yCuts, xstrata=c("sex.age"), data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE) summary(z1) ### Single yCut; z2 <- linbin2stg(birthwt~gest+mumht+bmi+ethnicdb+hyper+smoke, yCuts=yCut1, xstrata=c("sex.age"), data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE) summary(z2)
Fits location-scale model of the form Y = eta1 + exp(eta2)*error to
data with a single continuous Y-variable and two-phase missingness structure,
using the linear predictors eta1 and eta2 for specification of
the location and scale respectively.
locsc2stg(formula1, formula2, yCuts=NULL, weights=NULL, xstrata=NULL, data=list(), obstype.name="obstype", method="direct", fit=TRUE, errdistrn="logistic", errmodpars=6, xs.includes=FALSE, compactX=FALSE, compactY=TRUE, straty.maxnvals=20, start=NULL, Qstart=NULL, deltastart=NULL, int.rescale=TRUE, control=mlefn.control(...), control.inner=mlefn.control.inner(...), ...)locsc2stg(formula1, formula2, yCuts=NULL, weights=NULL, xstrata=NULL, data=list(), obstype.name="obstype", method="direct", fit=TRUE, errdistrn="logistic", errmodpars=6, xs.includes=FALSE, compactX=FALSE, compactY=TRUE, straty.maxnvals=20, start=NULL, Qstart=NULL, deltastart=NULL, int.rescale=TRUE, control=mlefn.control(...), control.inner=mlefn.control.inner(...), ...)
formula1 |
A symbolic description of the location model to be fitted (eta1). |
formula2 |
A symbolic description of the log-scale model to be fitted
(eta2). |
yCuts |
Cutpoints used to define Y-strata. Critical when
|
weights |
An optional vector of weights to be used in the fitting process.
Should be |
xstrata |
Specify names of the stratification variables to be used, e.g.
|
obstype.name |
Name of the variable specifying labels for observations
by sampling and variable type: |
data |
A data frame containing all the variables required for analysis,
including those for |
method |
Two methods are implemented: |
fit |
If |
errdistrn |
A specification for the error distribution. Three choices are provided:
standard logistic ( |
errmodpars |
Set parameter values for the error distribution. The default is 6 for student's-t distribution. |
xs.includes |
|
compactX |
If |
compactY |
If TRUE, limit the Y-values observed at the first phase
( |
straty.maxnvals |
If |
start |
Starting values for the regression parameters. Can be compusory if the program cannot produce a valid starting value at some situations. |
Qstart |
An optional starting matrix for Pr(Ystratum=i|Xstratum=j). Can be compulsory if the program cannot produce a valid starting value at some situations. |
deltastart |
An optional starting matrix for Pr(X=xk|Xstratum=j). This
is only applicable to |
int.rescale |
If |
control |
Specify control parameters for the iterations in |
control.inner |
Specify control parameters for inner iterations nested
within |
... |
Further arguments passed to or from related functions. |
This function fits location-scale models to continuous Y using different error
distributions with various types of observations collected at different two-phase
sampling schemes. More detailed descriptions of this function can be found in
"Description of the missreg Library" (Wild and Jiang).
Two methods are implemented with either Y being categorical
("ycutmeth") or at a continuous scale ("direct"). The argument
yCuts is critical to the first approach but only required for the second
approach when a starting value is needed. If yCuts is a vector, it defines
the Y-strata with intervals (-infty, yCuts, infty). If yCuts
is a matrix, the number of columns indicates the number of strata used and you
can define different cutpoints for each stratum. If you want to have differing
numbers of cutpoints for different X-strata, pad out the bottom of any column
that is not full with NAs.
missReport |
Matrix containing information on deleted records with missing observations. |
StrReport |
Cross tabulation of counts for different levels of |
xStrReport |
Cross tabulation of counts for |
key |
Specify detailed classification for each of the X-strata. |
yCutsKey |
Specify the cutoff intervals for defined Y-strata within each X-stratum. |
fit |
|
error |
The error messages returned by |
coefficients |
The coefficients matrix with estimates, standard errors, z values and associated p-values. |
loglk |
Log-likelihood returned from final |
score |
Score vector returned from final |
inf |
Observed information matrix returned from final |
fitted |
The fitted values of Y obtained from the model. |
cov |
The asymptotic covariance matrix (inverse of the informnation matrix). |
cor |
The asymptotic correlation matrix. |
Qmat |
The estimated Pr(Ystratum=i|Xstratum=j) from the last iteration. |
deltamat |
The estimated |
The function summary.locsc2stg provides a complete summary of
the regression results including the Wald tests and a regression panel.
All related output functions (print.locsc2stg,
summary.locsc2stg and print.summary.locsc2stg) don't have
help files provided at the moment.
Chris Wild, Yannan Jiang
Description of the missreg Library, Wild and Jiang, 2007.
data(lowbirth.ls) lowbirth.ls$sex.age <- interaction(lowbirth.ls$sex,lowbirth.ls$gest) yCuts <- matrix(c(2550,2650,2740,2840,2900,3010,3030,3140),nrow=1) z1 <- locsc2stg(birthwt ~ gest + mumht + bmi+ethnicdb+hyper+smoke, ~1, yCuts=yCuts, xstrata=c("sex.age"), data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE, method="ycutmeth") summary(z1) z2 <- locsc2stg(birthwt ~ gest + mumht + bmi+ethnicdb+hyper+smoke,~1, xstrata=c("sex.age"),data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE, method="direct", start=z1$coefficients, compactX=TRUE, compactY=TRUE, straty.maxnvals=20) summary(z2) z2 <- locsc2stg(birthwt ~ gest + mumht + bmi+ethnicdb+hyper+smoke,~1, yCuts=yCuts, xstrata=c("sex.age"), data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE, method="direct", start=z1$coefficients, Qstart=z1$Qmat, compactX=TRUE, compactY=TRUE, straty.maxnvals=20, control.inner=mlefn.control.inner(n.earlyit=3)) summary(z2) z3 <- locsc2stg(birthwt ~ gest + mumht + bmi+ethnicdb+hyper+smoke,~1, xstrata=c("sex.age"),data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE, method="direct", start=z2$coefficients, deltastart=z2$deltamat, compactX=TRUE, compactY=TRUE, straty.maxnvals=100) summary(z3)data(lowbirth.ls) lowbirth.ls$sex.age <- interaction(lowbirth.ls$sex,lowbirth.ls$gest) yCuts <- matrix(c(2550,2650,2740,2840,2900,3010,3030,3140),nrow=1) z1 <- locsc2stg(birthwt ~ gest + mumht + bmi+ethnicdb+hyper+smoke, ~1, yCuts=yCuts, xstrata=c("sex.age"), data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE, method="ycutmeth") summary(z1) z2 <- locsc2stg(birthwt ~ gest + mumht + bmi+ethnicdb+hyper+smoke,~1, xstrata=c("sex.age"),data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE, method="direct", start=z1$coefficients, compactX=TRUE, compactY=TRUE, straty.maxnvals=20) summary(z2) z2 <- locsc2stg(birthwt ~ gest + mumht + bmi+ethnicdb+hyper+smoke,~1, yCuts=yCuts, xstrata=c("sex.age"), data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE, method="direct", start=z1$coefficients, Qstart=z1$Qmat, compactX=TRUE, compactY=TRUE, straty.maxnvals=20, control.inner=mlefn.control.inner(n.earlyit=3)) summary(z2) z3 <- locsc2stg(birthwt ~ gest + mumht + bmi+ethnicdb+hyper+smoke,~1, xstrata=c("sex.age"),data=lowbirth.ls, obstype.name=c("instudy"), xs.includes=FALSE, method="direct", start=z2$coefficients, deltastart=z2$deltamat, compactX=TRUE, compactY=TRUE, straty.maxnvals=100) summary(z3)
A subset of the data collected in the Auckland Birthweight Collaborative (ABC) Study.
data(lowbirth.bin)data(lowbirth.bin)
A data frame with 1148 observations on the following 18 variables.
A factor defining the case (sga) and control (aga) status of the baby.
1=female or 2=male
A factor with levels retro and strata,
used as the obstype variable in function call
A factor with levels as class intervals of mother's height
A factor with levels A (Asian), E (Euro.),
M (Maori) or P (Pacifican)
Marital status of the mother
Mother's ccupational group, 1 to 3 (3 is highest)
Height of the mother in cm
Weight of the mother in kg
Body mass index of the mother
Smoking status of the mother prior to pregnancy
Smoking variable from database
Mother's age at first pregnancy
Any hypertension (1=yes, 0=no)
Mother's educational level
Mother's age left school
As for mstrat with some levels combined
Number of subjects with each observation (frequency)
The ABC study was conducted in 1995-1997 in order to find potential risk factors
associated with small-for-gestational-age babies in New Zealand. It was a
population-based case-control study with the cases being those babies with
their birthweights equal to or below the sex-specific 10th percentile for
gestational age in the population.
The lowbirh.bin is a semi-random subset of the original data.
Description of the missreg Library, Wild and Jiang, 2007.
A sub-function called by ML2Inf to supply values and its derivatives for
the first part of the profile loglikelihood regarding to the model of interest
using the discrete partition version.
MEtaProspModInf(theta,nderivs=2,y,x,wts=1,modelfn,off.set=0, ...)MEtaProspModInf(theta,nderivs=2,y,x,wts=1,modelfn,off.set=0, ...)
theta |
Vector of the parameter values. |
nderivs |
Number of derivatives to be calculated, ranged from 0 (loglikelihood only) to 2 (information matrix). |
y |
The response of interest, can be either a vector or matrix. |
x |
A 3-dimensional array (R*C*M) specifying the covariates values,
with R the number of observations, C the length of |
wts |
An optional vector of weights ( |
modelfn |
A class of sub-functions called by |
off.set |
The offset provided in a matrix form (R*M) with R the number of observations and M the number of linear predictors used. |
... |
Further arguments passed to or from related functions. |
This sub-function is used to implement prospective regression models with a fixed number of
M linear predictors. It calculates the value and its derivatives for the first part of
the profile loglikelihood in the form of l*(theta,Q) within each s-stratum sum_{A(s)}{n_i^(s)*log{f(y_i^{(s)}|x_i^{(s)};theta)}} ,
with respect to theta through the M linear predictors (m=1,...,M), eta_{im} = o_{im}+x_{i(m)}^T*theta
See "Description of the missreg Library" for all details.
A list with the following components
loglk |
Log-likelihood obtained from the current |
score |
Score vector obtained from the curent |
inf |
Observed information matrix obtained from the current |
Chris Wild, Yannan Jiang
Description of the missreg Library, Wild and Jiang, 2007.
A sub-function called by MLdirectInf to provide the value, score vector
and information matrix at theta for the so-called profile loglikelihood
l_P(theta) of the form l(theta, delta) within each s-stratum
with stratified two-phase sampled data. It reduces to an unstratified approach when
nstrata=1.
ML2directInf(theta, nderivs = 2, modelfn, hmodelfn, x, y, Aposn, Acounts, Bposn, Bcounts, hvalue, Cmult, delta, off.set = matrix(0, dim(x)[1], dim(x)[3]), inxStrat, control.inner = mlefn.control.inner(...), ...)ML2directInf(theta, nderivs = 2, modelfn, hmodelfn, x, y, Aposn, Acounts, Bposn, Bcounts, hvalue, Cmult, delta, off.set = matrix(0, dim(x)[1], dim(x)[3]), inxStrat, control.inner = mlefn.control.inner(...), ...)
theta |
Starting values for parameters |
nderivs |
Number of derivatives to be calculated. Either 0 (loglikelihood value only), 1 (also return score vector), or 2 (also return information matrix). |
modelfn |
A class of sub-functions called by |
hmodelfn |
A class of sub-functions called by |
x |
A 3-dimensional array (R*C*M) specifying the covariates values,
with R the number of observations, C the length of |
y |
The response of interest, can be eitehr a vector or matrix. |
Aposn |
A vector specifying the positions of those observations with the set of complete (x, y)-values from s-stratum. |
Acounts |
A vector specifying the frequency of each observation ( |
Bposn |
A vector specifying the positions of those observations with the
x-values observed in s-stratum; |
Bcounts |
A vector specifying the frequency of each observation ( |
hvalue |
The |
Cmult |
The |
delta |
The |
off.set |
The offset provided in a matrix form (R*M) with R the number of observations and M the number of linear predictors used. |
inxStrat |
See |
control.inner |
Specify control parameters for inner iterations nested within
the |
... |
Further arguments passed to or from related functions. |
This is the core function in the direct approach to calculate the value, score vector
and observed information matrix at theta for the profile loglikelihood l_P(theta)
of the form l^(s)(theta,delta^(s)) within each s-stratum.
It is an inner function called by MLdirectInf.
A list with the following components.
loglk |
Log-likelihood value obtained from the current |
score |
Score vector obtained from the current |
inf |
Observed information matrix obtained from the current |
delta |
A vector of length J providing values for |
Chris Wild, Yannan Jiang
Description of the missreg Library, Wild and Jiang, 2007.
A sub-function called by MLInf to provide the value, score vector and
information matrix at theta for the so-called profile loglikelihood
l*(theta, Q) within each stratum with stratified two-phase
sampled data. It reduces to an unstratified approach when nstrata=1.
ML2Inf(theta, nderivs = 2, ProspModInf, StratModInf, x, y, Aposn, Acounts, Bposn, Bcounts, rvec, Qs, usage = "thetaonly", thetaparts = 0, paruse = "auto", inxStrat, off.set = matrix(0, dim(x)[1], dim(x)[3]), control.inner = mlefn.control.inner(...), ...)ML2Inf(theta, nderivs = 2, ProspModInf, StratModInf, x, y, Aposn, Acounts, Bposn, Bcounts, rvec, Qs, usage = "thetaonly", thetaparts = 0, paruse = "auto", inxStrat, off.set = matrix(0, dim(x)[1], dim(x)[3]), control.inner = mlefn.control.inner(...), ...)
theta |
Starting values for parameters |
nderivs |
Number of derivatives to be calculated. Either 0 (loglikelihood value only), 1 (also return score vector), or 2 (also return information matrix). |
ProspModInf |
A class of sub-functions called by |
StratModInf |
A class of sub-functions called by |
x |
See |
y |
See |
Aposn |
A vector specifying the positions of those observations contributed to the A part of the loglikelihood in the data matrix. |
Acounts |
A vector specifying the frequency of each observation ( |
Bposn |
A vector specifying the positions of those observations contributed to the B part of the loglikelihood in the data matrix. |
Bcounts |
A vector specifying the frequency of each observation ( |
rvec |
The |
Qs |
The |
usage |
Work with and report results for the following three conditions:
(1) |
thetaparts |
A vector of length 2 specifying the number of |
paruse |
The choice of using either |
inxStrat |
Optional to enable printing a diagnostic when |
off.set |
See |
control.inner |
Specify control parameters for inner iterations nested within
the |
... |
Further arguments passed to or from related functions. |
This is the core function at the distrete partition version to calculate the value,
score vector and observed information matrix at theta for the so-called profile
loglikelihood l*^(s)(theta,Q^(s)). It is an inner function called by MLInf
to supply values of l* and its derivatives within each s-stratum.
See "Description of the missreg Library" for more details.
A list with the following components.
loglk |
Log-likelihood obtained from the current |
score |
Score vector obtained from the current |
inf |
Observed information matrix obainted from the current |
Qs |
A vector of length K providing values for |
Chris Wild, Yannan Jiang
Description of the missreg Library, Wild and Jiang, 2007.
An outer function of ML2directInf to provide the value, score vector and information
matrix at theta for the profile loglikelihood l_P(theta) of the form
l(theta,delta) with stratified two-phase sampled data. It reduces to an unstratified
approach when nstrata=1.
MLdirectInf(theta, nderivs = 2, deltamat = NULL, modelfn, hmodelfn, x, y, xStrat, Aposn, Acounts, Bposn, Bcounts, hvalue, Cmult, hxStrat, off.set = matrix(0, dim(x)[1], dim(x)[3]), extra = NULL, control.inner = mlefn.control.inner(...), ...)MLdirectInf(theta, nderivs = 2, deltamat = NULL, modelfn, hmodelfn, x, y, xStrat, Aposn, Acounts, Bposn, Bcounts, hvalue, Cmult, hxStrat, off.set = matrix(0, dim(x)[1], dim(x)[3]), extra = NULL, control.inner = mlefn.control.inner(...), ...)
theta |
Starting values for parameter |
nderivs |
Number of derivatives to be calculated. See |
deltamat |
The |
modelfn |
See |
hmodelfn |
See |
x |
See |
y |
See |
xStrat |
A vector of values 1 to S specifying the stratum membership of each observation. |
Aposn |
A vector specifying the positions of those observations with the set of complete (x,y)-values. |
Acounts |
A vector specifying the frequency of each observation ( |
Bposn |
A vector specifying the positions of those observations with the x-values
observed; |
Bcounts |
A vector specifying the frequency of each observation ( |
hvalue |
The |
Cmult |
The |
hxStrat |
A vector of value 1 to S specifying the stratum membership of
each |
off.set |
See |
extra |
Provides |
control.inner |
See |
... |
Further arguments passed to or from related functions. |
This is the direct function called by mlefn to calculate the value, score vector
and observed information matrix at theta for the so-called profile loglikelihood
l_P(theta) using the direct approach. It calls the inner function ML2directInf
to evaluate l^(s)(theta,delta^(s)) within each s-stratum.
A list with the following components.
loglk |
Log-likelihood obtained from the current |
score |
Score vector obtained from the current |
inf |
Observed information matrix obtained from the current |
extra |
A list providing updated |
Chris Wild, Yannan Jiang
Description of the missreg Library, Wild and Jiang, 2007.
A function to maximise, minimise or find stationary values for a (general) function. It was
originally written to maximize a loglikelihood function so that is the language that is
employed.mlefn.control and mlefn.control.inner supply parameter values to control
the iterative process and reporting. They differ only in their defaults.
mlefn(theta, loglkfn, control=mlefn.control(...), ...) mlefn.control(messg="", niter=20, tol=1e-08, guide="uphill", print.progress=2, max.eigenrat=0.05, n.earlyit=0, constrain="no", fixed=NULL, Aconstrain=NULL, cconstrain=NULL, ...) mlefn.control.inner(messg="Inner:", niter=20, tol=1e-08, guide="auto", print.progress=0, max.eigenrat=0.05, n.earlyit=0, constrain="no", fixed=NULL, Aconstrain=NULL, cconstrain=NULL, ...)mlefn(theta, loglkfn, control=mlefn.control(...), ...) mlefn.control(messg="", niter=20, tol=1e-08, guide="uphill", print.progress=2, max.eigenrat=0.05, n.earlyit=0, constrain="no", fixed=NULL, Aconstrain=NULL, cconstrain=NULL, ...) mlefn.control.inner(messg="Inner:", niter=20, tol=1e-08, guide="auto", print.progress=0, max.eigenrat=0.05, n.earlyit=0, constrain="no", fixed=NULL, Aconstrain=NULL, cconstrain=NULL, ...)
theta |
Starting values for the parameters of the loglikelhood function. |
loglkfn |
An inner function to compute the loglikelihood and its
derivatives. The values returned by this function must be a list with
|
messg |
A labelling string to be printed as a part of warnings etc.
Useful with nested calls to |
niter |
Maximum number of iterations used. The default is 20. |
tol |
Level of tolerance for checking the convergence. |
guide |
Specification of the direction for convergence with the following choices:
|
print.progress |
A numeric value used to control the printing of error messages (if any); 0 should be used if no printing is required. |
max.eigenrat |
An argument used in the inner function |
n.earlyit |
Number of iterations to be treated as |
constrain |
Specification of constrain on the parameter estimates with the following choices:
|
fixed |
A vector specifying the parameters to be fixed, indicated by their orders in |
Aconstrain |
an |
cconstrain |
A vector specifying the values of those "fixed" parameters. |
control |
to pass control options |
... |
Further arguments passed to or from related functions. |
This is the base function to maximise, minimise or find stationary values for theta
using the provided loglkfn function. All semi-parametric maximum likelihood approaches
we have proposed in missreg library require this function to obtain maximum likelihood
estimates of parameters. See "Description of the missreg Library" for more details.
A list with the following components.
theta |
Updated parameter estimates at this iteration. |
loglk |
Log-likelihood obtained from the current |
score |
Score vector obtained from the current |
inf |
Observed information matrix obtained from the curent |
constrscore |
Constrained score vector if |
constrinf |
Constrained observed information matrix if |
counter |
Number of iterations performed. |
error |
A numeric value indicating the types of errors during iterations; a value of 0 indicates no error. |
Chris Wild, Yannan Jiang
Nonlinear Regression, Seber and Wild, 1989. Wiley: New York.
Description of the missreg Library, Wild and Jiang, 2007.
An outer function of ML2Inf to provide the value, score vector and information matrix
at theta for the so-called profile loglikelihood l_P(theta) of the form
l*(theta, Q) with strtified two-phase sampled data. It reduces to an unstratified
approach when nstrata=1.
MLInf(theta, nderivs = 2, ProspModInf, StratModInf, x, y, Aposn, Acounts, Bposn, Bcounts, rmat, Qmat, xStrat = rep(1, dim(x)[1]), extra = NULL, off.set = matrix(0, dim(x)[1], dim(x)[3]), control.inner = mlefn.control.inner(...), ...)MLInf(theta, nderivs = 2, ProspModInf, StratModInf, x, y, Aposn, Acounts, Bposn, Bcounts, rmat, Qmat, xStrat = rep(1, dim(x)[1]), extra = NULL, off.set = matrix(0, dim(x)[1], dim(x)[3]), control.inner = mlefn.control.inner(...), ...)
theta |
Starting values for parameter |
nderivs |
Number of derivatives to be calculated. |
ProspModInf |
See |
StratModInf |
See |
x |
See |
y |
See |
Aposn |
See |
Acounts |
See |
Bposn |
See |
Bcounts |
See |
rmat |
The |
Qmat |
The |
xStrat |
A vector of values 1 to S specifying the stratum membership of each observation. |
extra |
Provides Qmat from last iteration as starting values for next
inner iterative loop in |
off.set |
See |
control.inner |
See |
... |
Further arguments passed to or from related functions. |
This is the direct function called by mlefn to calculate the value,
score vector and observed information matrix at theta for the so-called
profile loglikelihood l_P(theta) using the discrete partition version.
It calls the inner function ML2Inf to evaluate
l*^(s)(theta,Q^(s)) within each s-stratum.
A list with the following components.
loglk |
Log-likelihood obtained from the current |
score |
Score vector obtained from the current |
inf |
Observed information matrix obtained from the current |
extra |
A list providing updated Qmat values from previous iteration. |
Chris Wild, Yannan Jiang
Description of the missreg Library, Wild and Jiang, 2007.
Fits random intercept models to clustered binary data with the two-phase missingness structure.
rclusbin(formula, data, weights=NULL, ClusInd=NULL, IntraClus=NULL, xstrata=NULL, ystrata=NULL, obstype.name="obstype", NMat=NULL, xs.includes=FALSE, MaxInClus=NULL, rmsingletons=FALSE, retrosamp="proband", gamma=NULL, nzval0=20, fit=TRUE, devcheck=FALSE, linkname="logit", start=NULL, Qstart=NULL, sigma=NULL, control=mlefn.control(...), control.inner=mlefn.control.inner(...), ...)rclusbin(formula, data, weights=NULL, ClusInd=NULL, IntraClus=NULL, xstrata=NULL, ystrata=NULL, obstype.name="obstype", NMat=NULL, xs.includes=FALSE, MaxInClus=NULL, rmsingletons=FALSE, retrosamp="proband", gamma=NULL, nzval0=20, fit=TRUE, devcheck=FALSE, linkname="logit", start=NULL, Qstart=NULL, sigma=NULL, control=mlefn.control(...), control.inner=mlefn.control.inner(...), ...)
formula |
A symbolic description of the model to be fitted. |
data |
A data frame containing all the variables required for analysis, including those for |
weights |
An optional vector of weights to be used in the fitting process. Should be |
ClusInd |
Name of a vector in the data frame specifying cluster membership. Can be |
IntraClus |
Name of a vector in the data frame specifying intra-cluster sequence of individual subjects in a cluster. The one with the smallest i.d. is treated as the proband who were originally sampled into a study. |
xstrata |
Specify names of the stratification variables to be used,e.g. |
ystrata |
Specify name of the variable defing the Y-strata. Compulsory when gamma probabilities are used (see Details for more descriptions). |
obstype.name |
Name of the variable specifying labels for observations by sampling and variable type: |
NMat |
Population counts in a matrix form with rows and columns corresponding to Y-strata and X-strata respectively. Should not be provided when there is any observation of the type |
xs.includes |
|
MaxInClus |
A value specifying the maximum number of individuals allowed in a cluster. Set to |
rmsingletons |
If |
retrosamp |
Three restrospective sampling schemes can be applied based on the Y-status of all subjects in the same cluster: |
gamma |
A vector of length 2 specifying the probabilities that individuals belong to |
nzval0 |
Number of points to calculate the zeros and weights needed for Gauss-Hermite quadrature. The default is 20. |
fit |
If |
devcheck |
If |
linkname |
A specification for the model link function. Three choices are provide: |
start |
Starting values for the regression parameters. The first |
Qstart |
An optional starting matrix for Pr(Ystratum=i|Xstratum=j). Can be compulsory if the program cannot produce a valid starting value at some situations. |
sigma |
An optional starting value for |
control |
Specify control parameters for the iterations in |
control.inner |
Specify control parameters for inner iterations nested within |
... |
Further arguments passed to or from related function. |
This function fits binary regression models with a random intercept of the form a_i=e^{w*eps_i} where w=log(sigma) and eps_i is standard normal for each cluster, along with a linear predictor eta_{ij}=x_{ij}^T*beta for the subject j in the i^th cluster.
The function can be applied to both prospective and retrospective data with various types of observations collected at different two-phase sampling schemes. Three retrospective samplings are considered with the Y-strata defined as:
(1) the case-control status of the proband only ("proband");
(2) the case-cotnrol status of all members in the same cluster ("allcontrol"). If any one of the members are cases, the cluster belongs to Y-strata=1 and otherwise Y-strata=0;
(3) the case-control status of all members in the same cluster plus the gamma probabilities ("gamma"). The conditional probability of Y-strata=1 depends on sum_j{Y_j}=1 (with gamma_1 probability) or sum_j{Y_j}>1 (with gamma_2 probability).
Here Y_j indicates case-control status (1 for a case and 0 for a control) of the j^{th} individual in a cluster.
http://www.stat.auckland.ac.nz/~wild
Description of the missreg Library, Wild and Jiang, 2007.
data(brainpairs) brainpairs$obstype <- rep("retro", dim(brainpairs)[1]) z2 <- rclusbin(bt ~ ep + ca, ClusInd="id", IntraClus="relid", data=brainpairs) summary(z2) data(rdat00) z3 <- rclusbin(y~x, ClusInd="cluster", data=rdat00, retrosamp="allcontrol") summary(z3)data(brainpairs) brainpairs$obstype <- rep("retro", dim(brainpairs)[1]) z2 <- rclusbin(bt ~ ep + ca, ClusInd="id", IntraClus="relid", data=brainpairs) summary(z2) data(rdat00) z3 <- rclusbin(y~x, ClusInd="cluster", data=rdat00, retrosamp="allcontrol") summary(z3)
Fits random intercept models to clustered binary data after case and control sampling, wherein interest is in the relationship between a binary response (Y) that is related to the sampling variable (Z).
rclusbin2(formula1, formula2, weights=NULL, ClusInd.name=NULL, IntraClus.name=NULL, yname, xstrata=NULL, ystrata, obstype.name="obstype", data, NMat=NULL, xs.includes=FALSE, MaxInClus=NULL, rmsingletons=FALSE, retrosamp=TRUE, nzval0=20, fit=TRUE, devcheck=FALSE, linkname="logit", start=NULL, Qstart=NULL, sigma=NULL, paruse="xis", control=mlefn.control(...), control.inner=mlefn.control.inner(...), ...)rclusbin2(formula1, formula2, weights=NULL, ClusInd.name=NULL, IntraClus.name=NULL, yname, xstrata=NULL, ystrata, obstype.name="obstype", data, NMat=NULL, xs.includes=FALSE, MaxInClus=NULL, rmsingletons=FALSE, retrosamp=TRUE, nzval0=20, fit=TRUE, devcheck=FALSE, linkname="logit", start=NULL, Qstart=NULL, sigma=NULL, paruse="xis", control=mlefn.control(...), control.inner=mlefn.control.inner(...), ...)
formula1 |
A symbolic description of the random intercept model to be fitted, i.e. the model of interest. |
formula2 |
A symbolic description of the auxiliary model to be fitted, between the sampling (case-control) variable and the binary response of interest. |
weights |
An optional vector of weights to be used in the fitting process. Should be |
ClusInd.name |
Name of a vector in the data frame specifying cluster membership. Can be |
IntraClus.name |
Name of a vector in the data frame specifying intra-cluster sequence of individual subjects in a cluster. The one with the smallest i.d. is treated as the proband who were originally sampled into a study. |
yname |
Name of the binary response variable of interest in the data frame. Must be specified. |
xstrata |
Specify names of the stratification variables to be used, e.g. |
ystrata |
Specify name of the variable defining the case and control strata. |
obstype.name |
Name of the variable specifying labels for observations by sampling and variable type: |
data |
A data frame containing all the variables required for analysis, including those for |
NMat |
Population counts in a matrix form with rows and columns corresponding to case-control strata and X-strata respectively. Should not be provided when there is any observation of the type |
xs.includes |
|
MaxInClus |
A value specifying the maximum number of individuals allowed in a cluster. Set to |
rmsingletons |
If |
retrosamp |
As the default, must be |
nzval0 |
Number of points to calculate the zeros and weights needed for Gauss-Hermite quadrature. The default is 20. |
fit |
If |
devcheck |
If |
linkname |
A specification for the model link function. Three choices are provide: |
start |
Starting values for all regression parameters. |
Qstart |
An optional starting matrix for Pr(Zstratum=i|Xstratum=j). Can be compulsory if the program cannot produce a valid starting value at some situations. |
sigma |
An optional starting value for |
paruse |
As the default, must be |
control |
Specify control parameters for the iterations in |
control.inner |
Specify control parameters for inner iterations nested within |
... |
Further arguments passed to or from related function. |
To be added.
http://www.stat.auckland.ac.nz/~wild
Longitudinal Studies of Binary Response Data Following Case-Control and Stratified Case-Control Sampling: Design and Analysis, Schildcrout and Rathouz, BIOMETRICS 2009.
data(adhd) head(adhd) adhd$obstype <- rep("retro", dim(adhd)[1]) adhd$probandS <- 2 - adhd$proband #as 1/2 for case/control adhd$sexF <- adhd$sex-1 #as 1/0 for female/male adhd$wave1 <- ifelse(adhd$wave==1, 1, 0) adhd$wave2 <- ifelse(adhd$wave==2, 1, 0) adhd1 <- adhd[adhd$wave==1,] z0 <- glm(proband ~ adhd, family=binomial, data=adhd1) z0$coefficients nMat <- ftable(adhd1$sex~adhd1$probandS) # 1=male; 2=female; nMat ### Samping ratios for boys/girls (Schildcrout & Rathouz) pi_ctF <- 1/22.6 pi_ctM <- 1/22.4 NMat <- cbind(c(113, 96/pi_ctM), c(25, 21/pi_ctF)) z <- rclusbin2(adhd ~ wave1+wave2+wave+sexF+african+other+wave*sexF+wave*african, proband.1~adhd.1, ClusInd.name="id", IntraClus.name="wave", yname="adhd", ystrata="probandS", xstrata="sex", data=adhd, NMat=NMat, nzval0=40, control=mlefn.control(niter=100)) summary(z)data(adhd) head(adhd) adhd$obstype <- rep("retro", dim(adhd)[1]) adhd$probandS <- 2 - adhd$proband #as 1/2 for case/control adhd$sexF <- adhd$sex-1 #as 1/0 for female/male adhd$wave1 <- ifelse(adhd$wave==1, 1, 0) adhd$wave2 <- ifelse(adhd$wave==2, 1, 0) adhd1 <- adhd[adhd$wave==1,] z0 <- glm(proband ~ adhd, family=binomial, data=adhd1) z0$coefficients nMat <- ftable(adhd1$sex~adhd1$probandS) # 1=male; 2=female; nMat ### Samping ratios for boys/girls (Schildcrout & Rathouz) pi_ctF <- 1/22.6 pi_ctM <- 1/22.4 NMat <- cbind(c(113, 96/pi_ctM), c(25, 21/pi_ctF)) z <- rclusbin2(adhd ~ wave1+wave2+wave+sexF+african+other+wave*sexF+wave*african, proband.1~adhd.1, ClusInd.name="id", IntraClus.name="wave", yname="adhd", ystrata="probandS", xstrata="sex", data=adhd, NMat=NMat, nzval0=40, control=mlefn.control(niter=100)) summary(z)