Principal Component Analysis

Principal Component Analysis (PCA) is one of the widely applied statistical techniques to reduce the dimension of the data without losing information or losing minimum information. Nowadays, PCA is also a popular in unsupervised machine learning. PCA leverages the concept of spectral decomposition of matrix, which is quite popular in Linear Algebra.

Source: Understanding Principal Component Analysis | by Trist'n Joseph | Towards Data Science

In this article, we discuss how to implement PCA manually though many statistical software, such as EViews and STATA, will conduct PCA within few clicks. Also, open-source software, such as R and Python, has packages that produces principal components with ease.

Principal Component Analysis in Python Manually - Jovian

We use matrix to derive principal components. We denote matrix with capital letter.

Suppose we have `X_{m \times n}` and we need to reduce its dimension.

We need to carry out some preprocessing steps.

First, we need to standardize `X` and mean centering of standardized `X`.

` X_\text{std}=\frac{X-\bar{X}}{\sigma_X}`

`Z=X_\text{std}-\bar{X_\text{std}}`

We store the value of mean centered standardized `X` into `Z`.

We finish the preprocessing steps after completing these two steps.

Now, We proceed to estimate the variance-covariance matrix of `Z`.

`C = \frac{1}{n}Z^TZ`

`C` is the matrix containing variance and covariance of `Z`. It is a square matrix of `n \times n` in our example.

Furthermore, we estimate the eigenvalues and eigenvectors following spectral decomposition. We must be familiar with a popular technique called spectral decomposition of matrix or eigen decomposition of matrix. The spectral decomposition of matrix decomposes matrix into eigen value and eigen vector.

` C = \Phi \Lambda \Phi^{-1}`

The spectral decomposition of `C` - where `C` must be non-defective square matrix - decomposes `C` into `\Phi`, the matrix of eigenvectors, and `\Lambda`, a diagonal matrix containing eigenvalues.

Now, we need to order eigenvalues in descending order and order eigenvectors according to eigenvalues.

We need to obtain principal component.

` PC = Z\Phi^*`

Here, `\Phi^*` is the eigenvectors corresponding to eigenvalues arranged in descending order.

Now, factor loadings are obtained by multiplying eigenvectors `(\Phi^*)` by the square root of corresponding eigenvalues.

`\text{Loadings} = \sqrt{\text{eigenvalues}} \times \text{eigenvectors}`

Post a Comment

0 Comments