1. Learn
  2. /
  3. Courses
  4. /
  5. Helsinki Open Data Science

Connected

Exercise

Divide and conquer: train and test sets

When we want to use a statistical method to predict something, it is important to have data to test how well the predictions fit. Splitting the original data to test and train sets allows us to check how well our model works.

The training of the model is done with the train set and prediction on new data is done with the test set. This way you have true classes / labels for the test data, and you can calculate how well the model performed in prediction.

Time to split our data!

Instructions

100 XP
  • Use the function nrow() on the boston_scaled to get the number of rows in the dataset. Save the number of rows in n.
  • Execute the code to choose randomly 80% of the rows and save the row numbers to ind
  • Create train set by selecting the row numbers that are saved in ind.
  • Create test set by subtracting the rows that are used in the train set
  • Take the crime classes from the test and save them as correct_classes
  • Execute the code to remove crime from test set