1. Learn
  2. /
  3. Courses
  4. /
  5. Helsinki Open Data Science

Connected

Exercise

PCA with R

Principal Component Analysis (PCA) can be performed by two sightly different matrix decomposition methods from linear algebra: the Eigenvalue Decomposition and the Singular Value Decomposition (SVD).

There are two functions in the default package distribution of R that can be used to perform PCA: princomp() and prcomp(). The prcomp() function uses the SVD and is the preferred, more numerically accurate method.

Both methods quite literally decompose a data matrix into a product of smaller matrices, which let's us extract the underlying principal components. This makes it possible to approximate a lower dimensional representation of the data by choosing only a few principal components.

Instructions

100 XP
  • Create human_std by standardizing the variables in human.
  • Print out summaries of the standardized variables. What are the means? Do you know the standard deviations? (hint: ?scale)
  • Use prcomp() to perform principal component analysis on the standardized data. Save the results in the object pca_human
  • Use biplot() to draw a biplot of pca_human (Click next to "Plots" to view it larger)
  • Experiment with the argument cex of biplot(). It should be a vector of length 2 and it can be used to scale the labels in the biplot. Try for example cex = c(0.8, 1). Which number affects what?
  • Add the argument col = c("grey40", "deeppink2")