1. Learn
  2. /
  3. Courses
  4. /
  5. Helsinki Open Data Science

Connected

Exercise

Find the outlaw... Outlier!

As an example of the summary measure approach we will look into the post treatment values of the BPRS. The mean of weeks 1 to 8 will be our summary measure. First calculate this measure and then look at boxplots of the measure for each treatment group. See how the mean summary measure is more variable in the second treatment group and its distribution in this group is somewhat skew. The boxplot of the second group also reveals an outlier, a subject whose mean BPRS score of the eight weeks is over 70. It might bias the conclusions from further comparisons of the groups, so we shall remove that subject from the data. Without the outlier, try to figure which treatment group might have the lower the eight-week mean. Think, considering the variation, how can we be sure?

Instructions

100 XP
  • Create the summary data BPRSL8S
  • Glimpse the data
  • Draw the boxplot and observe the outlier
  • Find a suitable threshold value and use filter() to exclude the outlier to form a new data BPRSL8S1
  • Glimpse and draw a boxplot of the new data to check the outlier has been dealt with