--- vignette: > %\VignetteIndexEntry{Multiple responses are for the birds} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} title: "Multiple responses are for the birds" author: "Thomas Lumley" date: "02/09/2019" output: rmarkdown::html_vignette --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(rimu) ``` The `rimu` package handles multiple-response data, a generalisation of factor data. With factor data, there is a defined set of categories and each observation comes from one category. With multiple-reponse data, there is a defined set of categories, but each observation could come from multiple categories. We provide two classes: `mr` for multiple-response presence/absence data, and `ms` for scored or ranked multiple-responses where each category gets a non-zero score or rank. The `birds` dataset is a small subset of data from the Great Backyard Bird Count, in the US and Canada. We have counts of 12 birds by US county and Canadian province. The twelve birds are ```{r} data(birds) names(birds)[1:12] ``` There's a thirteenth column giving the location name. These birds are perhaps more familiar as - Common poorwill, *Phalaenoptilus nuttallii* - Frigatebird *Fregata magnificens* - Lewis's woodpecker *Melanerpes lewis* - Swamp sparrow *Melospiza georgiana* - Virginia rail *Rallus limicola* - Painted redstart *Myioborus pictus* - Mountain chickadee *Poecile gambeli* - Ring-necked duck *Aythya collaris* - Yellowheaded blackbird *Xanthocephalus xanthocephalus* - Common hill myna *Gracula religiosa* - Scott's oriole *Icterus parisorum* - Black-billed cuckoo *Coccyzus erythropthalmus* First, let's put them into the data structures ```{r} bird_count <- as.ms(birds[,-13],na.rm=TRUE) bird_presence <- as.mr(bird_count) ``` The bird counts will print like a sparse matrix ```{r} head(bird_count) ``` but the bird presence/absence data has a more compact character form ```{r} head(bird_presence) ``` What birds are most often present? ```{r} mtable(bird_presence) ``` And what birds tend to go together? We can draw an upset chart ```{r} plot(bird_presence,nsets=12) ``` That's all a bit clumsy because of the long names,but you can see, for example, that the swamp sparrow and ring-necked duck tend to co-occur. Let's recode to shorter names. ```{r error=TRUE} bird_presence<-mr_recode(bird_presence, poorwill="Phalaenoptilus nuttallii", frigatebird="Fregata magnificens", woodpecker ="Melanerpes lewis", sparrow="Melospiza georgiana", rail="Rallus limicola", redstart="Myioborus pictus", chickadee="Poecile gambeli", duck="Aythya collaris", yellowhead="Xanthocephalus xanthocephalus", myna="Dracula religiosa", oriole="Icterus parisorum", cuckoo="Coccyzus erythropthalmus") ``` Oops. ```{r } bird_presence<-mr_recode(bird_presence, poorwill="Phalaenoptilus nuttallii", frigatebird="Fregata magnificens", woodpecker ="Melanerpes lewis", sparrow="Melospiza georgiana", rail="Rallus limicola", redstart="Myioborus pictus", chickadee="Poecile gambeli", duck="Aythya collaris", yellowhead="Xanthocephalus xanthocephalus", myna="Gracula religiosa", oriole="Icterus parisorum", cuckoo="Coccyzus erythropthalmus") ``` Now try again: ```{r} mtable(bird_presence) mtable(bird_presence,bird_presence) plot(bird_presence, nsets=12,nint=30) ``` The default `image` plot is of the table of the variable by itself and shows the number of co-occurences. With `type="conditional"`, the plot shows the proportion of each bird (on the y-axis) given that a particular bird (on the x-axis) is present. ```{r} image(bird_presence) image(bird_presence, type="conditional") ``` We might want to focus on just the more commonly observed birds ```{r} common_birds<-mr_lump(bird_presence,n=4) mtable(common_birds) mtable(common_birds,common_birds) plot(common_birds) ``` Or consider just the rare and interesting ones ```{r} rare_birds<-mr_lump(bird_presence,n=-5,other_level="Common") mtable(rare_birds) mtable(rare_birds,rare_birds) plot(rare_birds,nsets=6) plot(mr_drop(rare_birds,"Common"),nsets=5) ```