1. Learn
  2. /
  3. Courses
  4. /
  5. Helsinki Open Data Science

Connected

Exercise

Box plots by groups

Box plots are an excellent way of displaying and comparing distributions. A box plot visualizes the 25th, 50th and 75th percentiles (the box), the typical range (the whiskers) and the outliers of a variable.

The whiskers extending from the box can be computed by several techniques. The default (in base R and ggplot) is to extend them to reach to a data point that is no more than 1.5*IQR away from the box, where IQR is the inter quartile range defined as

IQR = 75th percentile - 25th percentile

Values outside the whiskers can be considered as outliers, unusually distant observations. For more information on IQR, see wikipedia.

Instructions

100 XP
  • Initialize and plot of student grades (G3), with high_use grouping the grade distributions on the x-axis. Draw the plot as a box plot.
  • Add an aesthetix element to the plot by defining col = sex inside aes()
  • Define a similar (box) plot of the variable absences grouped by high_use on the x-asis and the aesthetic col = sex.
  • Add a main title to the last plot with ggtitle("title here"). Use "Student absences by alcohol consumption and sex" as a title.
  • Does high use of alcohol have a connection to school absences?