--- title: "Race/ethnicity in YRBS" author: "Thomas Lumley" date: "03/09/2019" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Race/ethnicity in YRBS} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` The `usethnicity` data set contains variables on race and ethnic identification from the 2017 Youth Risk Behaviour Survey, together with two variables on smoking behaviour. The YRBS is a multistage cluster-sampled survey, so valid inference about associations requires using survey design information. This subset of variables without weights is useful only for demonstration purposes. ```{r} library(rimu) data(usethnicity) head(usethnicity) ``` Question 4 asks *Are you Hispanic or Latino?*, and Question 5 asks for any of - A. American Indian or Alaska Native, - B. Asian, - C. Black or African American, - D. Native Hawaiian or Other Pacific Islander, - E. White that apply. In the data set, these five letters are pasted together into a single variable. We need to split `Q5` into its component letters. The \code{as.mr} method for character strings does this ```{r} race<-as.mr(usethnicity$Q5,"") mtable(race) ``` There's a spurious `" "` category from the string splitting, and the values `F`, `G`, and `H` are also invalid, so we need to remove them ```{r} race<-mr_drop(race,c(" ","F","G","H")) mtable(race) ``` We might want easier-to-recognise names for the categories ```{r} race <- mr_recode(race, AmIndian="A",Asian="B", Black="C", Pacific="D", White="E") ``` Now, Hispanic/Latino ethnicity is asked in a separate question. We convert it via the `as.mr` method for logical vectors, and then combine it with `race` ```{r} hispanic<-as.mr(usethnicity$Q4==1, "Hispanic") ethnicity<-mr_union(race, hispanic) ethnicity[101:120] ``` The `plot` method shows co-occurence of the various race/ethnicity terms ```{r} plot(ethnicity,nsets=6) ``` Tabulations against other factor or multiple-response variables are possible with `mtable`. Note that `mtable` shows frequencies for each category; use `as.character` to get frequencies for combinations -- do not use `as.factor`, which is not generic and so cannot have a `mr` method. ```{r} mtable(ethnicity, usethnicity$QN30) table(ethnicity %has% "Black", usethnicity$QN30) table(ethnicity %hasonly% "Black", usethnicity$QN30) table(as.character(ethnicity), usethnicity$QN30) ```