3/18/2018

Data Visualization 101

A good graph is like a good joke

  • It's not funny if you have to explain it
    • Should be appropriate for your audience
    • Should be appropriate for your data
  • It boils down to communication–make sure your graph conveys the relevant information in a digestable way

ggplot2

  • Created by Hadley Wickham in 2005 for data visualization in R
  • Alternative to base R graphics (more personalizable & versatile)
  • Utilizes a graphical grammar of layers, where each component of the plot needs to be separately specified
  • Extension projects include

Grammar of ggplot2

  • ggplot works in layers
  • Each layer is added sequentially to produce the plot
  • Each layer has five components –> makes ggplot call
    • A dataset
    • The aesthetic mapping (aes())
    • A statistical transformation (stat= )–there are some defaults so we don't always have to specify this
    • A geometric object (geom_)
    • A position adjustment (position = )

(1) The Dataset

  • First, import ggplot2 and your dataset ( + other relevant packages)
library(ggplot2)
library(dplyr)

df = read.csv("data/allANT.csv")
df_run = read.csv("data/ANT_runsheet.csv")
  • We can use dplyr to read dataset into ggplot function!
  • If we plan to use the same dataset for the whole plot, we only need to specify it once, rather than at each layer

  • Note, the package is called "ggplot2" and the function is called "ggplot"

df %>% ggplot(...)

#To assign a plot to a variable...
plot1 = df %>% ggplot(...)

(2) The Aesthetic Mapping

  • The aesthetic mapping is the meat (or equivalent vegetarian protein form) of the plot
  • Identifies what specific information from the dataset will be selected and represented by the layer

  • Let's say we want to see the distribution of 'age' in our dataset by constructing a histogram from the runsheet

plot1 = df_run %>% ggplot(aes(age))
  • Explictly specifying the x and y is optional–ggplot will take the first as x and the second as y (or, for univariate plotting like a histogram you can enter just one variable)

  • What does our plot look like?
plot1

-There are no layers yet so the plot can't be displayed yet

(3) The Statistical Transformation

  • Performs some useful statistical transformation
  • Or can keep data as is, which is what we will do here
  • Each stat has a default geom (just as every geom has a default stat)

  • We add to our previous code…

plot1 = df_run %>% ggplot(aes(age)) +
  layer(
    mapping = NULL, #already provided above
    data = NULL, #already provided above
    stat = "bin"
  )
  • But this still does not create a full layer yet

(4) A Geometric Object

  • Perform the actual rendering of the layer, controlling the type of plot that you create
  • Many different types of plots supported by ggplot
  • Can serve as a shortcut to writing out all components of a layer (we'll get to this soon)

Common Geoms

Common Geoms