# Reference

## Contents

## Things you should know for Midterm

### You should know how to…

- …plot the five most common graphs in
`ggplot()`

- …when to use
`facet_wrap`

and how to use it - …customize
`color`

- …use
`theme_()`

effectively - …extract subsets of a dataset using
`filter`

- …use the logical operators effectively (
`==`

,`!=`

,`&`

,`|`

) - …calculate summary statistics (mean, median, min, max) using
`summarise`

- …calculate these statistics
*by*one or more variables using`group_by()`

- …calculate group
*proportions*using`group_by`

(e.g., the*gss_cat*example from homework 3) - …create new variables out of existing ones using
`mutate`

- …code new variables using
`case_when`

- …sort dataframes by one or more variables using
`arrange`

f - …keep only a subset of variables in your dataset using
`select`

- …merge two datasets with
`inner_join`

,*even when*the datasets have differently named keys in common - …convert categorical variables to factors using
`factor`

and`factor_reorder`

- …tidy data at a conceptual level, i.e., know what it means for data to be “tidy”
- …transform data from wide format to long format using
`gather`

- …draw a trend line using
`geom_smooth`

- …talk about and interpret correlation coefficients
- …speak the language of statistical modeling, i.e., outcome variable, explanatory variable
- …run an OLS model using
`lm`

and extract its output - … interpret results from an OLS model, e.g., the coefficient, “one unit change in…”
- …interpret coefficient estimates of
*continuous*and*categorical*variables

## Things you should know for Final

### You should know…

- …the fundamental problem of causality and why we can’t simply regress X against Y to estimate the effect of X on Y
- …the difference between experimental research (e.g., rats and insulin) and observational research (e.g., school suspensions and crime)
- …how to make causal diagrams (DAGs) that explains how different variables affect one another
- …how to explain and give real-world examples of the “fundamental confounds”, especially forks, pipes, and colliders, and how you should deal with each of them
- …what a backdoor is, and why we worry about backdoors in research
- …how we can use multiple regression to close backdoors
- …how to interpret coefficient estimates with specificity, using the units in which the variables are measured
- …the logic of a fixed effects model, when we use fixed effects, what they control for (and what they don’t)
- …how to do coarsened exact matching informally (by hand), and understand why we use matching
- …difference-in-difference, when we use it, how to calculate it by hand, and its limitations
- …regression discontinuity, when we use it, how to do it by hand, and its limitations
- …the difference between populations and samples, and why we worry about estimating stuff from samples
- …why we can use bootstrapping to get a sense of sampling uncertainty in our estimate, and the mechanics of how to do it
- …what a confidence interval is and how to interpret it
- …the logic of a hypothesis test, what the null hypothesis is, how we can use permutation to generate a null distribution
- …what a p-value is, how to interpret it, what statistical significance is
- …how to read a table with regression output (coefficient, standard error, p-value, significance)
- …the tradeoff between Type 1 and Type 2 error in increasing/decreasing the alpha level