1. Learn
  2. /
  3. Courses
  4. /
  5. Helsinki Open Data Science

Connected

Exercise

Towards clustering: distance measures

Similarity or dissimilarity of objects can be measured with distance measures. There are many different measures for different types of data. The most common or "normal" distance measure is Euclidean distance.

There are functions that calculate the distances in R. In this exercise, we will be using the base R's dist() function. The function creates a distance matrix that is saved as dist object. The distance matrix is usually square matrix containing the pairwise distances of the observations. So with large datasets, the computation of distance matrix is time consuming and storing the matrix might take a lot of memory.

Instructions

100 XP
  • Load the MASS package and the Boston dataset from it
  • Create dist_eu by calling the dist() function on the Boston dataset. Note that by default, the function uses Euclidean distance measure.
  • Look at the summary of the dist_eu
  • Next create object dist_man that contains the Manhattan distance matrix of the Boston dataset
  • Look at the summary of the dist_man