```{r} library(dplyr) library(gapminder) class(gapminder) ``` * `glimpse` -- a better `str` ```{r} glimpse(gapminder) ``` ## dplyr verbs * `sample_frac` -- sample a given percentage of rows ```{r} sample_frac(gapminder, 0.5) ``` * `sample_n` -- sample n rows ```{r} set.seed(2016) tiny <- sample_n(gapminder, 3) tiny ``` * `rename` -- rename columns ```{r} rename(tiny, GDP = gdpPercap, population = pop) ``` * `select` -- select columns from the data frame ```{r} select(tiny, starts_with("y"), pop, matches("^co.*")) ``` * `filter` -- select rows from the data frame, producing a subset ```{r} # filter(tiny, lifeExp > 60 & year < 1980) # or equivalent: filter(tiny, lifeExp > 60, year < 1980) ``` * `slice` -- select rows from data frame by index, producing a subset ```{r} slice(gapminder, 300:303) ``` * `mutate` -- add new columns that can be functions of existing columns ```{r} mutate(tiny, newVar = (lifeExp / gdpPercap), newcol = 3:1) tiny <- mutate(tiny, newVar = (lifeExp / gdpPercap), newVar2 = newVar^2) glimpse(tiny) ``` * `transmute` -- add new columns that can be functions of the existing columns, and drop the existing columns ```{r} tiny <- transmute(tiny, id = 1:3, country, continent, newVarSqrt = sqrt(newVar), pop) tiny ``` * `arrange` -- reorder rows ```{r} arrange(tiny, pop) arrange(gapminder, desc(year), lifeExp) ``` * `summarize` -- create collapsed summaries of a data frame by applying functions to columns ```{r} summarize(gapminder, aveLife = mean(lifeExp)) ``` * `distinct` -- find distinct rows, for repetitive data ```{r} tiny2 <- tiny[c(1,1,2,2), ] dim(tiny2) distinct(tiny2) n_distinct(tiny2) ``` ### Chaining #### Base-R-style ```{r} set.seed(2016) sample_n(filter(gapminder, continent == "Asia" & lifeExp < 65), 2) ``` #### Using pipes ```{r} set.seed(2016) gapminder %>% filter(continent == "Asia") %>% filter(lifeExp < 65) %>% sample_n(2) ``` ### More verbs * `group_by` -- convert the data frame into a grouped data frame, where the operations are performed by group ```{r} gapminder %>% group_by(continent) %>% summarize(aveLife = mean(lifeExp), count = n(), countries = n_distinct(country)) ``` ```{r} gapminder %>% group_by(continent) %>% tally ``` ### Join multiple data frames Example originally from ```{r} superheroes <- c("name, alignment, gender, publisher", "Magneto, bad, male, Marvel", "Storm, good, female, Marvel", "Mystique, bad, female, Marvel", "Batman, good, male, DC", "Joker, bad, male, DC", "Catwoman, bad, female, DC", "Hellboy, good, male, Dark Horse Comics") superheroes <- read.csv(text = superheroes, strip.white = TRUE, as.is=TRUE) publishers <- c("publisher, yr_founded", " DC, 1934", " Marvel, 1939", " Image, 1992") publishers <- read.csv(text = publishers, strip.white = TRUE, as.is=TRUE) ``` ```{r} inner_join(superheroes, publishers) ``` ```{r} left_join(superheroes, publishers) ``` ```{r} full_join(superheroes, publishers) ```