1. Learn
  2. /
  3. Courses
  4. /
  5. Helsinki Open Data Science
  • 1

    Regression and model validation

    Data wrangling, simple regression, multiple regression, regression diagnostics

  • 2

    Logistic regression

    Regression for binary outcomes, training and testing a (predictive) model, cross-validation

  • 3

    Clustering and classification

    Datasets in R, Linear Discriminant Analysis (LDA) and K-means clustering

  • 4

    Dimensionality reduction techniques

    Principal component analysis (PCA), Correspondence analysis (CA)

  • 5

    Analysis of longitudinal data

    Graphical Displays and Summary Measure Approach, Linear Mixed Effects Models for Normal Response Variables


Connected

Exercise

Datasets inside R

Welcome to the Clustering and classification chapter.

R has many (usually small) datasets already loaded in. There are also datasets included in the package installations. Some of the datasets are quite famous (like the Iris flower data) and they are frequently used for teaching purposes or to demonstrate statistical methods.

This week we will be using the Boston dataset from the MASS package. Let's see how it looks like!

Instructions

100 XP
  • Load the Boston dataset from MASS
  • Explore the Boston dataset. Look at the structure with str() and use summary() to see the details of the variables.
  • Draw the plot matrix with pairs()