1. Learn
  2. /
  3. Courses
  4. /
  5. Helsinki Open Data Science

Connected

Exercise

Exploring a data frame

Often the most interesting feature of your data are the relationships between the variables. If there are only a handful of variables saved as columns in a data frame, it is possible to visualize all of these relationships neatly in a single plot.

Base R offers a fast plotting function pairs(), which draws all possible scatter plots from the columns of a data frame, resulting in a scatter plot matrix. Libraries GGally and ggplot2 together offer a slow but more detailed look at the variables, their distributions and relationships.

Instructions

100 XP
  • Draw a scatter matrix of the variables in learning2014 (other than gender)
  • Adjust the code: Add the argument col to the pairs() function, defining the colour with the 'gender' variable in learning2014.
  • Draw the plot again to see the changes.
  • Access the ggpot2 and GGally libraries and create the plot p with ggpairs().
  • Draw the plot. Note that the function is a bit slow.
  • Adjust the argument mapping of ggpairs() by defining col = gender inside aes().
  • Draw the plot again.
  • Adjust the code a little more: add another aesthetic element alpha = 0.3 inside aes().
  • See the difference between the plots?