Principal component analysis (PCA) is one of the most important tools of data analysis.
Thanks to an intuitive R built-in prcomp
function I never had to remember related
undergraduate-level linear algebra material. If someone asked me what the principle components are,
I'd answer that these are the directions of maximum variation of the data, but wouldn't say more.
So I decided to reduce my ignorance and review the method in more detail.
On the following pages we assume:
- vectors (or points) in \(d\)-dimensional space are \(d \times 1\) matrices (i.e. columns)
- the data are centered (otherwise we can always shift the origin)