New Tips from the RCS Stats Team

April 18, 2018
R Library - DPLYR

Dplyr is an R package used for data manipulation which provides much more concise, readable blocks of data manipulation code once you can understand its syntax. Dplyr is built around 5 verbs: SelectFilterArrangeMutate, and Summarize.

Select - Selects certain columns in your dataframe
Filter - Select specific rows in your dataframe
Arrange - Orders the rows in your dataframe
Mutate - Creates new columns in your dataframe
Summarize - Summarizes chucks of your dataframe

These commands are all tied together by piping information from the last command to the next command using %>%. Let's imagine a data set where we have 1000's of schools in all 50 states and we wanted to get the average SAT score for students inside each New England state you could summarize the entire command this way:

summarized_df=student_df %>% filter(region=='NE' %>% group_by(state) %>% summarize(average_score=mean(SAT_score)

This would create a new dataframe named summarized_df that would contain a line for each NE state and a variable called average_score with the average SAT score for students in that state.
See also: Stats Tips