Title: | Responses in Multiplex |
---|---|
Description: | Tools for manipulating, exploring, and visualising multiple-response data, including scored or ranked responses. Conversions to and from factors, lists, strings, matrices; reordering, lumping, flattening; set operations; tables; frequency and co-occurrence plots. |
Authors: | Thomas Lumley [aut, cre], Annie Cohen [ctb] |
Maintainer: | Thomas Lumley <[email protected]> |
License: | GPL-3 |
Version: | 0.7 |
Built: | 2024-11-16 03:03:52 UTC |
Source: | https://github.com/tslumley/rimu |
Constructs mr
objects representing multiple-choice questions where more than one choice is allowed.
as.mr(x, ...) ## S3 method for class 'logical' as.mr(x,name=deparse(substitute(x)),...) ## S3 method for class 'list' as.mr(x, sort.levels=TRUE,...,levels=NULL) ## S3 method for class 'factor' as.mr(x, sort.levels=FALSE,...) ## S3 method for class 'data.frame' as.mr(x, sort.levels=FALSE,...,na.rm=TRUE) ## S3 method for class 'character' as.mr(x, sep=", ", sort.levels=TRUE,..., levels=NULL) ## Default S3 method: as.mr(x, sort.levels=TRUE, levels=unique(x),...) ## S3 method for class 'ms' as.mr(x,...)
as.mr(x, ...) ## S3 method for class 'logical' as.mr(x,name=deparse(substitute(x)),...) ## S3 method for class 'list' as.mr(x, sort.levels=TRUE,...,levels=NULL) ## S3 method for class 'factor' as.mr(x, sort.levels=FALSE,...) ## S3 method for class 'data.frame' as.mr(x, sort.levels=FALSE,...,na.rm=TRUE) ## S3 method for class 'character' as.mr(x, sep=", ", sort.levels=TRUE,..., levels=NULL) ## Default S3 method: as.mr(x, sort.levels=TRUE, levels=unique(x),...) ## S3 method for class 'ms' as.mr(x,...)
x |
Object to be converted to class |
... |
for compatibility; not used |
sort.levels |
put the levels of the |
levels |
optional character vector of the permitted levels |
name |
level name (for a vector) or vector of level names to replace the column names (for a matrix) |
na.rm |
If |
sep |
Regular expression for splitting the string |
The internal representation of mr
objects is as a logical matrix
with the levels as column names.
The method for logical x
coerces a single vector to a one-column
matrix, and then applies the name
argument as the column
name. Given a matrix, the name
argument is optional and replaces
the existing column names; the default is not used.
The method for list x
takes a list of character vectors that
represent the levels present for one observation. The method for strings splits the string at the supplied separator and then uses the list method.
The method for factor x
produces an mr
object with the
factor levels as levels. Each observation will have only one value.
The data.frame
object works for logical or numeric columns of a
data frame. Zero or negative values are treated as 'not present',
positive values as 'present'. Optionally, NA
values are coded as
'not present', which is useful when the data frame was created by
reshape
or dplyr::spread
.
The method for ms
objects simply drops the score/rank information
Object of class mr
nzbirds_list<-list(c("kea","tui"), c("kea","ruru","kaki"), c("ruru"), c("tui","ruru"), c("tui","kea","ruru"), c("tauhou","kea")) nzbirds_list as.mr(nzbirds_list) as.mr(c("kea, tui","kea, ruru, kaki","ruru","tui, ruru")) data(nzbirds) nzbirds as.mr(nzbirds) data(ethnicity) ethnicity as.logical(ethnicity) as.mr(as.logical(ethnicity))
nzbirds_list<-list(c("kea","tui"), c("kea","ruru","kaki"), c("ruru"), c("tui","ruru"), c("tui","kea","ruru"), c("tauhou","kea")) nzbirds_list as.mr(nzbirds_list) as.mr(c("kea, tui","kea, ruru, kaki","ruru","tui, ruru")) data(nzbirds) nzbirds as.mr(nzbirds) data(ethnicity) ethnicity as.logical(ethnicity) as.mr(as.logical(ethnicity))
The internal representation is as a numeric matrix with 0 when a level is not present and the non-zero rank or score when it is present. The data.frame and matrix methods uses the numeric values of x
, and by default set NA
values to 'not present'. The list method takes a list with a character vector for each observation and uses the position in the list as the rank/score. The character method splits the string at the separators to give a list and uses the list method.
The mr
method uses a score of 1 whenever the level is present.
as.ms(x, ...) ## S3 method for class 'list' as.ms(x,...,levels=NULL) ## S3 method for class 'data.frame' as.ms(x,...,na.rm=TRUE) ## S3 method for class 'matrix' as.ms(x,...,na.rm=TRUE) ## S3 method for class 'mr' as.ms(x,...) ## S3 method for class 'character' as.ms(x,sep=", ", ...,levels=NULL)
as.ms(x, ...) ## S3 method for class 'list' as.ms(x,...,levels=NULL) ## S3 method for class 'data.frame' as.ms(x,...,na.rm=TRUE) ## S3 method for class 'matrix' as.ms(x,...,na.rm=TRUE) ## S3 method for class 'mr' as.ms(x,...) ## S3 method for class 'character' as.ms(x,sep=", ", ...,levels=NULL)
x |
object to be converted |
... |
for compatibility; not used. |
levels |
Optional character vector giving the permitted levels |
na.rm |
Convert |
sep |
Regular expression for splitting the character string |
Object of class ms
nzbirds_list<-list(c("kea","tui"), c("kea","ruru","kaki"), c("ruru"), c("tui","ruru"), c("tui","kea","ruru"), c("tauhou","kea")) nzbirds_list (msbirds<-as.ms(nzbirds_list)) (bird_mat <- unclass(msbirds)) as.ms(bird_mat)
nzbirds_list<-list(c("kea","tui"), c("kea","ruru","kaki"), c("ruru"), c("tui","ruru"), c("tui","kea","ruru"), c("tauhou","kea")) nzbirds_list (msbirds<-as.ms(nzbirds_list)) (bird_mat <- unclass(msbirds)) as.ms(bird_mat)
The vmr
class wraps the mr
class using the vctrs
package, for compatibility with tidyverse tbl_df
objects (tibbles).
as.vmr(x, ...) new_vmr(x, levels = unique(do.call(c, x)))
as.vmr(x, ...) new_vmr(x, levels = unique(do.call(c, x)))
x |
For |
... |
not used |
levels |
the permitted levels for the object |
These objects need the vctrs
and pillar
packages to work, and need the tibble
package to be useful.
An object of class vmr
The internals
vignette for internal structure
if (requireNamespace("vctrs", quietly=TRUE)){ data(nzbirds) nzbirds tidybirds<-as.vmr(nzbirds, na.rm=TRUE) tidybirds }
if (requireNamespace("vctrs", quietly=TRUE)){ data(nzbirds) nzbirds tidybirds<-as.vmr(nzbirds, na.rm=TRUE) tidybirds }
Counts of observations for 12 bird species by US county and Canadian province in the Great Backyard Bird survey. These birds were randomly sampled from the much larger number in the full data set. See the vignette for more details.
data("birds")
data("birds")
A data frame with 3046 observations on the following 13 variables.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
location
a character vector
data(birds) birds<-as.ms(birds[,1:12],na.rm=TRUE) mtable(as.mr(birds))
data(birds) birds<-as.ms(birds[,1:12],na.rm=TRUE) mtable(as.mr(birds))
The statistical standard for collecting ethnicity data requires that respondents can mark all that are applicable. The level 1 values are "Māori", "Pacific Peoples" (ie, Pacific Island ethnicities), "Asian", "European", and "MELAA" (Middle Eastern, Latin American, and African). This is artificial data
data("ethnicity")
data("ethnicity")
An object of class mr
data(ethnicity) ethnicity
data(ethnicity) ethnicity
These perform diverse useful tasks. mr_count
counts the number of levels present for each individual. mr_na
sets NA
values to something else, ms_na
sets them to 0 (ie, not present),
mr_drop
and ms_drop
drop some levels from the object.
mr_count(x, na.rm = TRUE) mr_drop(x, levels,...) ms_drop(x, levels) mr_na(x, na=TRUE) ms_na(x)
mr_count(x, na.rm = TRUE) mr_drop(x, levels,...) ms_drop(x, levels) mr_na(x, na=TRUE) ms_na(x)
x |
|
na.rm |
Remove |
levels |
character vector of levels to remove |
na |
Value ( |
... |
not used |
An integer vector for mr_count
, an object of class mr
, or ms
for the other two functions
data(usethnicity) race<-as.mr(strsplit(as.character(usethnicity$Q5),"")) mtable(race) race<-mr_drop(race,c(" ","F","G","H")) mtable(race) ## to keep just specified levels use [ mtable(race[,c("A","D")]) ## How many do people identify with table(mr_count(race)) data(nzbirds) seenbirds<-as.mr(nzbirds>0) countbirds<-mr_count(seenbirds) ## How many types of birds were seen table(countbirds) data(ethnicity) ethnicity mr_na(ethnicity, FALSE)
data(usethnicity) race<-as.mr(strsplit(as.character(usethnicity$Q5),"")) mtable(race) race<-mr_drop(race,c(" ","F","G","H")) mtable(race) ## to keep just specified levels use [ mtable(race[,c("A","D")]) ## How many do people identify with table(mr_count(race)) data(nzbirds) seenbirds<-as.mr(nzbirds>0) countbirds<-mr_count(seenbirds) ## How many types of birds were seen table(countbirds) data(ethnicity) ethnicity mr_na(ethnicity, FALSE)
Convert a multiple-response object into a factor using a supplied ordering. Each observation is assigned its first level in the ordering. That is, an observation that has priorities[1]
as one of its levels is assigned that value. An observation that does not priorities[1]
as one of its levels, but does have priorities[2]
is assigned priorities[2]
.
mr_flatten(x, priorities, sort=FALSE)
mr_flatten(x, priorities, sort=FALSE)
x |
|
priorities |
Character vector of levels. |
sort |
if |
A factor
data(ethnicity) ethnicity ## NZ 'prioritised ethnicity' priority<-c("Maori", "Pacific", "Asian", "European/Other") eth <- mr_na(mr_recode(ethnicity, `European/Other`="European", `European/Other` = "MELAA"), FALSE) mr_flatten(eth, priority) mr_flatten(eth, priority, sort=TRUE)
data(ethnicity) ethnicity ## NZ 'prioritised ethnicity' priority<-c("Maori", "Pacific", "Asian", "European/Other") eth <- mr_na(mr_recode(ethnicity, `European/Other`="European", `European/Other` = "MELAA"), FALSE) mr_flatten(eth, priority) mr_flatten(eth, priority, sort=TRUE)
mr_inorder
and ms_inorder
use the order in which the
levels first appear in the data (which is invariant to locale),mr_inseq
and
ms_inseq
sort alphabetically (for the current locale). mr_infreq
sorts by frequency, and ms_inscore
applies a function to the values in each level – one such function is mean0
, which takes the mean of non-zero values. Finally, ms_reorder
and mr_reorder
use some function of a second variable computed on the observations where each level is present.
mr_inorder(x,...) ms_inorder(x) mr_inseq(x,...) ms_inseq(x) mr_infreq(x,na.rm=TRUE,...) ms_infreq(x) ms_inscore(x, fun=mean0) mean0(y) mr_reorder(x, v, fun=median,...) ms_reorder(x, v, fun=median)
mr_inorder(x,...) ms_inorder(x) mr_inseq(x,...) ms_inseq(x) mr_infreq(x,na.rm=TRUE,...) ms_infreq(x) ms_inscore(x, fun=mean0) mean0(y) mr_reorder(x, v, fun=median,...) ms_reorder(x, v, fun=median)
x |
|
na.rm |
Remove |
v , fun
|
Sort levels of |
y |
numeric vector |
... |
not used |
Object of class mr
These are based on the reordering functions for factors in the
forcats
package.
data(ethnicity) mr_infreq(ethnicity) mr_inseq(ethnicity) data(nzbirds) mtable(nzbirds) mtable(ms_inorder(nzbirds)) mtable(ms_inseq(nzbirds)) mtable(ms_inscore(nzbirds, mean0))
data(ethnicity) mr_infreq(ethnicity) mr_inseq(ethnicity) data(nzbirds) mtable(nzbirds) mtable(ms_inorder(nzbirds)) mtable(ms_inseq(nzbirds)) mtable(ms_inscore(nzbirds, mean0))
Combine the least common or most common levels of a mr
object into an "other" level.
mr_lump(x, n, prop, other_level = "Other", ties.method = c("min", "average", "first", "last", "random", "max"),...)
mr_lump(x, n, prop, other_level = "Other", ties.method = c("min", "average", "first", "last", "random", "max"),...)
x |
Object of class |
n |
Positive integer to keep the most common |
prop |
Positive prop preserves values that appear at least prop of the time. Negative prop preserves values that appear at most -prop of the time. |
other_level |
Label for the lumped levels |
ties.method |
How to handle ties. Passed to |
... |
not used |
An object of class mr
Based on fct_lump
from the forcats
package.
data(ethnicity) mtable(ethnicity) mtable(mr_lump(ethnicity,2)) mtable(mr_lump(ethnicity,-2)) data(rstudiosurvey) ## Other software being used other_software<- as.mr(rstudiosurvey[[40]]) mtable(other_software) ## The top 20 responses common<-mr_lump(other_software, n=20) mtable(common) ## 'None' isn't really another package mtable(mr_drop(common,"None")) ## Packages with at least 20% use mtable(mr_lump(other_software, prop=0.2))
data(ethnicity) mtable(ethnicity) mtable(mr_lump(ethnicity,2)) mtable(mr_lump(ethnicity,-2)) data(rstudiosurvey) ## Other software being used other_software<- as.mr(rstudiosurvey[[40]]) mtable(other_software) ## The top 20 responses common<-mr_lump(other_software, n=20) mtable(common) ## 'None' isn't really another package mtable(mr_drop(common,"None")) ## Packages with at least 20% use mtable(mr_lump(other_software, prop=0.2))
Relabel some or all of the levels of a multiple-response object. Two levels that are recoded to the same value will be combined.
mr_recode(x, ...)
mr_recode(x, ...)
x |
Object of class |
... |
new names in the form |
New object of class mr
, ms
data(nzbirds) nzbirds<-as.mr(nzbirds) nzbirds ## recode to English names mr_recode(nzbirds,morepork="ruru",stilt="kaki",waxeye="tauhou") data(usethnicity) race<-as.mr(usethnicity$Q5,"") race<-mr_drop(race,c(" ","F","G","H")) race <- mr_recode(race, AmIndian="A",Asian="B", Black="C", Pacific="D", White="E") mtable(race)
data(nzbirds) nzbirds<-as.mr(nzbirds) nzbirds ## recode to English names mr_recode(nzbirds,morepork="ruru",stilt="kaki",waxeye="tauhou") data(usethnicity) race<-as.mr(usethnicity$Q5,"") race<-mr_drop(race,c(" ","F","G","H")) race <- mr_recode(race, AmIndian="A",Asian="B", Black="C", Pacific="D", White="E") mtable(race)
Creates a data frame where every observation has as many rows as it has levels present, plus an id column to specify which rows go together.
mr_stack(x, ..., na.rm = FALSE) ms_stack(x, ..., na.rm = FALSE)
mr_stack(x, ..., na.rm = FALSE) ms_stack(x, ..., na.rm = FALSE)
x |
multiple response object |
... |
other multiple response objects |
na.rm |
drop |
A data frame with columns values
and id
, plus a column scores
if x
is a ms
object. When more than one object is supplied, the result is an outer join of the two indindividual results, so it contains a row for every combination of an observed value from each object.
data(ethnicity) ethnicity mr_stack(ethnicity) data(nzbirds) nzbirds ms_stack(nzbirds) ## not actually a sensible use d <- mr_stack(ethnicity, nzbirds) head(d) with(d, table(ethnicity, nzbirds)) ## equivalent, but more efficient mtable(mr_na(ethnicity), mr_na(nzbirds))
data(ethnicity) ethnicity mr_stack(ethnicity) data(nzbirds) nzbirds ms_stack(nzbirds) ## not actually a sensible use d <- mr_stack(ethnicity, nzbirds) head(d) with(d, table(ethnicity, nzbirds)) ## equivalent, but more efficient mtable(mr_na(ethnicity), mr_na(nzbirds))
These functions take union, intersection, and difference of two multiple-response objects. An observation has a level in the union if it has that level in either input. It has the level in the intersection if it has the level in both inputs. It has the level in the difference if it has the level in x
and not in y
mr_union(x, y,...) mr_intersect(x, y,...) mr_diff(x, y,...)
mr_union(x, y,...) mr_intersect(x, y,...) mr_diff(x, y,...)
x , y
|
Objects of class |
... |
not used |
Object of class mr
data(usethnicity) race<-as.mr(usethnicity$Q5,"") race<-mr_drop(race,c(" ","F","G","H")) race <- mr_recode(race, AmIndian="A",Asian="B", Black="C", Pacific="D", White="E") mtable(race) hispanic<-as.mr(usethnicity$Q4==1, "Hispanic") ethnicity<-mr_union(race, hispanic) mtable(ethnicity) ethnicity[101:120]
data(usethnicity) race<-as.mr(usethnicity$Q5,"") race<-mr_drop(race,c(" ","F","G","H")) race <- mr_recode(race, AmIndian="A",Asian="B", Black="C", Pacific="D", White="E") mtable(race) hispanic<-as.mr(usethnicity$Q4==1, "Hispanic") ethnicity<-mr_union(race, hispanic) mtable(ethnicity) ethnicity[101:120]
Returns vector of TRUE
or FALSE
according to whether y
is onle of the levels present for that row or is the only level present for that row.
x %has% y x %hasonly% y x %hasall% ys x %hasany% ys
x %has% y x %hasonly% y x %hasall% ys x %hasany% ys
x |
|
y |
character vector specifying a level |
ys |
character vector specifying one or more levels |
Logical vector
data(ethnicity) ethnicity ethnicity %has% "Maori" ethnicity %hasonly% "Maori" data(nzbirds) as.mr(nzbirds) as.mr(nzbirds)
data(ethnicity) ethnicity ethnicity %has% "Maori" ethnicity %hasonly% "Maori" data(nzbirds) as.mr(nzbirds) as.mr(nzbirds)
Convert a multiple-response object into a named numeric vector using a supplied ordering.
ms_flatten(x, priorities, fun, start=0)
ms_flatten(x, priorities, fun, start=0)
x |
|
priorities |
Character vector of levels. |
fun |
Function for reducing two values to one. |
start |
starting value for |
Each observation is initially assigned the value start
. Starting with the lowest-priority level, the current value is combined with the new value as fun(new, current)
. Using fun=function(x,y) x
would return the value for the highest-priority level present; using fun=pmax
would return the highest score for any level present; using fun="+"
would return the sum of the scores.
A factor
data(ethnicity) ethnicity ## NZ 'prioritised ethnicity' eth <- mr_recode(ethnicity, `European/Other`="European", `European/Other` = "MELAA") mr_flatten(ethnicity, c("Maori","Pacific","Asian","European/Other")) data(nzbirds) ## hardest to see first ms_flatten(nzbirds, c("kaki","ruru","kea","tui","tauhou"),"+") ms_flatten(nzbirds, c("kaki","ruru","kea","tui","tauhou"), fun=function(x,y) x) ms_flatten(nzbirds, c("kaki","ruru","kea","tui","tauhou"),pmin,start=Inf)
data(ethnicity) ethnicity ## NZ 'prioritised ethnicity' eth <- mr_recode(ethnicity, `European/Other`="European", `European/Other` = "MELAA") mr_flatten(ethnicity, c("Maori","Pacific","Asian","European/Other")) data(nzbirds) ## hardest to see first ms_flatten(nzbirds, c("kaki","ruru","kea","tui","tauhou"),"+") ms_flatten(nzbirds, c("kaki","ruru","kea","tui","tauhou"), fun=function(x,y) x) ms_flatten(nzbirds, c("kaki","ruru","kea","tui","tauhou"),pmin,start=Inf)
Creates one-way and two-way tables using every level of a multiple response object. Use table(as.character(x))
to tabulate combinations of levels
mtable(x, y, na.rm = TRUE)
mtable(x, y, na.rm = TRUE)
x |
|
y |
|
na.rm |
remove missing values? |
A 1-d or 2-d array with names giving the levels
data(ethnicity) mtable(ethnicity) table(as.character(ethnicity)) data(nzbirds) nzbirds<-as.mr(nzbirds) ## co-occurence table mtable(nzbirds, nzbirds) ## table by a factor v<-rep(c("A","B"),3) mtable(nzbirds,v) data(nzbirds) mtable(nzbirds>0)
data(ethnicity) mtable(ethnicity) table(as.character(ethnicity)) data(nzbirds) nzbirds<-as.mr(nzbirds) ## co-occurence table mtable(nzbirds, nzbirds) ## table by a factor v<-rep(c("A","B"),3) mtable(nzbirds,v) data(nzbirds) mtable(nzbirds>0)
A small artifical dataset that could be produced by asking people to name New Zealand birds. Each observation has scores from 1 (first bird named) to at most 4 (fourth bird named).
data("nzbirds")
data("nzbirds")
A ms
object with 6 observations on the following 5 variables.
kea
a numeric vector
ruru
a numeric vector
tui
a numeric vector
tauhou
a numeric vector
kaki
a numeric vector
data(nzbirds) nzbirds as.mr(nzbirds)
data(nzbirds) nzbirds as.mr(nzbirds)
The plot method for mr
objects is an UpSet plot, showing co-occurences of the various categories. The image
method is a heatmap of the variable plotted against itself with mtable
.
## S3 method for class 'mr' plot(x, ...) ## S3 method for class 'mr' image(x, type = c("overlap", "conditional", "association", "raw"), ...) ## S3 method for class 'mr' barplot(height,...)
## S3 method for class 'mr' plot(x, ...) ## S3 method for class 'mr' image(x, type = c("overlap", "conditional", "association", "raw"), ...) ## S3 method for class 'mr' barplot(height,...)
x |
|
type |
|
height |
|
... |
Passed to |
Used for its side effect
data(rstudiosurvey) other_software<- as.mr(rstudiosurvey[[40]]) ## only those with at least 20 responses common<-mr_lump(other_software, n=20) common<-mr_drop(common, "None") ## UpSet plot plot(common) ## images image(common, type="conditional") image(common, type="association")
data(rstudiosurvey) other_software<- as.mr(rstudiosurvey[[40]]) ## only those with at least 20 responses common<-mr_lump(other_software, n=20) common<-mr_drop(common, "None") ## UpSet plot plot(common) ## images image(common, type="conditional") image(common, type="association")
The 'rstudiosurvey' data set contains 1838 rows of responses from the 2019 RStudio Community Survey, where columns are the 51 questions and a column for the timestamp. The variable names are the full questions. Multiple responses are separated by a comma and space. Non-ASCII characters have been converted with the "ASCII//TRANSLIT" option of iconv
.
data("rstudiosurvey")
data("rstudiosurvey")
A data frame with 1838 observations on the following 52 variables.
Timestamp
a character vector
a character vector
a numeric vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a numeric vector
a character vector
a numeric vector
a character vector
a character vector
a character vector
a character vector
a character vector
a numeric vector
a numeric vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a numeric vector
a numeric vector
a character vector
a character vector
a character vector
a numeric vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a character vector
a numeric vector
a character vector
a character vector
https://github.com/rstudio/r-community-survey/tree/master/2019
data(rstudiosurvey) names(rstudiosurvey)[40] ## Other software being used other_software<- as.mr(rstudiosurvey[[40]]) mtable(other_software) ## top 20 responses common<-mr_lump(other_software, n=20) mtable(common) ## 'None' isn't really another package common<-mr_drop(common, "None") mtable(common) ## UpSet plot plot(common) ## Excel users filled in the survey later timestamp<-as.Date(rstudiosurvey[[1]],format="%m/%d/%y") boxplot(timestamp~I(common %has% "Excel")) ## names in order of popularity t<-mtable(common) popular<-colnames(t)[order(t,decreasing=TRUE)] ## most popular package for each user cuml_users <- mr_flatten(common, popular, sort=TRUE) class(cuml_users) table(cuml_users) ## two-way tables ## people who also use Stata or Julia are less happy with R than those who don't names(rstudiosurvey)[18] happy<-factor(rstudiosurvey[[18]]) mtable(happy, common) round(prop.table(mtable(happy,common),2),2) ## mr objects can be dataframe columns, or expanded to individual levels df<-data.frame(timestamp, happy, common) dim(df) head(df) df_raw<-data.frame(timestamp, happy, as.matrix(common)) dim(df_raw) head(df_raw)
data(rstudiosurvey) names(rstudiosurvey)[40] ## Other software being used other_software<- as.mr(rstudiosurvey[[40]]) mtable(other_software) ## top 20 responses common<-mr_lump(other_software, n=20) mtable(common) ## 'None' isn't really another package common<-mr_drop(common, "None") mtable(common) ## UpSet plot plot(common) ## Excel users filled in the survey later timestamp<-as.Date(rstudiosurvey[[1]],format="%m/%d/%y") boxplot(timestamp~I(common %has% "Excel")) ## names in order of popularity t<-mtable(common) popular<-colnames(t)[order(t,decreasing=TRUE)] ## most popular package for each user cuml_users <- mr_flatten(common, popular, sort=TRUE) class(cuml_users) table(cuml_users) ## two-way tables ## people who also use Stata or Julia are less happy with R than those who don't names(rstudiosurvey)[18] happy<-factor(rstudiosurvey[[18]]) mtable(happy, common) round(prop.table(mtable(happy,common),2),2) ## mr objects can be dataframe columns, or expanded to individual levels df<-data.frame(timestamp, happy, common) dim(df) head(df) df_raw<-data.frame(timestamp, happy, as.matrix(common)) dim(df_raw) head(df_raw)
This data set contains variables on race and ethnic identification from the 2017 Youth Risk Behaviour Survey, together with two variables on smoking behaviour. The YRBS is a multistage cluster-sampled survey, so valid inference about associations requires using survey design information. This subset is useful only for demonstration purposes.
data("usethnicity")
data("usethnicity")
A data frame with 14765 observations on the following 4 variables.
Q4
1 is "Hispanic or Latino
Q5
Character string with zero or more of: A. American Indian or Alaska Native, B. Asian, C. Black or African American, D. Native Hawaiian or Other Pacific Islander, E. White
QN30
1 is "smoked cigarettes on one or more of the past 30 days"
QN31
1 is "smoked more than 10 cigarettes per day on the days they smoked during the past 30 days", those who did not smoke at all are NA
https://www.cdc.gov/healthyyouth/data/yrbs/data.htm
data(usethnicity) race<-as.mr(strsplit(as.character(usethnicity$Q5),"")) race<-mr_drop(race," ") mtable(race) hispanic<-as.mr(usethnicity$Q4==1,"Hispanic") ethnicity<-mr_union(race,hispanic) ethnicity[101:120]
data(usethnicity) race<-as.mr(strsplit(as.character(usethnicity$Q5),"")) race<-mr_drop(race," ") mtable(race) hispanic<-as.mr(usethnicity$Q4==1,"Hispanic") ethnicity<-mr_union(race,hispanic) ethnicity[101:120]
Similar to mr_stack
, but it works on whole data frames
rather than individual vectors, expanding a multiple-response variable
to a factor with a record for each observed response. The algorithm is
to convert the object to a logical matrix, pivot_longer
, then
filter
on the logical value.
vmr_stack(data, col, names_to = "level")
vmr_stack(data, col, names_to = "level")
data |
a tibble |
col |
the name of a multiple-response column in |
names_to |
the desired output name of the variable with the stacked levels |
An expanded data frame
data(ethnicity) if(require("tidyr",quietly=TRUE)){ t<-tibble(a=LETTERS[1:6], e=as.vmr(ethnicity,na.rm=TRUE)) t t |> vmr_stack(e,names_to="ethnicity") }
data(ethnicity) if(require("tidyr",quietly=TRUE)){ t<-tibble(a=LETTERS[1:6], e=as.vmr(ethnicity,na.rm=TRUE)) t t |> vmr_stack(e,names_to="ethnicity") }