1. Learn
  2. /
  3. Courses
  4. /
  5. Helsinki Open Data Science

Connected

Exercise

Dealing with not available (NA) values

In R, NA stands for not available, which means that the data point is missing. If a variable you wish to analyse contains missing values, there are usually two main options:

  • Remove the observations with missing values
  • Replace the missing values with actual values using an imputation technique.

We will use the first option, which is the simplest solution.

Instructions

100 XP
  • Create a smaller version of the human data by selecting the variables defined in keep
  • Use complete.cases() on human to print out a logical "completeness indicator" vector
  • Adjust the code: Define comp as the completeness indicator and print out the resulting data frame. When is the indicator FALSE and when is it TRUE? (hint: ?complete.cases()).
  • filter() out all the rows with any NA values. Right now, TRUE is recycled so that nothing is filtered out.