1. Learn
  2. /
  3. Courses
  4. /
  5. Helsinki Open Data Science

Connected

Exercise

K-means clustering

K-means is maybe the most used and known clustering method. It is an unsupervised method, that assigns observations to groups or clusters based on similarity of the objects. In the previous exercise we got a hang of distances. The kmeans() function counts the distance matrix automatically, but it is good to know the basics. Let's cluster a bit!

Instructions

100 XP
  • First change the centers in the kmeans() function to be 4 and execute the clustering code
  • Plot the Boston data with pairs(). Adjust the code by adding the col argument. Set the color based on the clusters that k-means produced. You can access the cluster numbers with km$cluster. What variables do seem to effect the clustering results? Note: With pairs() you can reduce the number of pairs to see the plots more clearly. On line 7, just replace Boston with for example Boston[6:10] to pair up 5 columns (columns 6 to 10).
  • Try a different number of clusters: 1, 2 and 3 (leave it to 3). Visualize the results.