Principal Component Analysis (PCA) is one of the widely applied statistical techniques to reduce the dimension of the data without losing information or losing minimum information. Nowadays, PCA is also a popular in unsupervised machine learning. PCA leverages the concept of spectral decomposition of matrix, which is quite popular in Linear Algebra.
Source: Understanding Principal Component Analysis | by Trist'n Joseph | Towards Data Science
In this article, we discuss how to implement PCA manually though many statistical software, such as EViews and STATA, will conduct PCA within few clicks. Also, open-source software, such as R and Python, has packages that produces principal components with ease.
Principal Component Analysis in Python Manually - Jovian
We use matrix to derive principal components. We denote matrix with capital letter.
Suppose we have `X_{m \times n}` and we need to reduce its dimension.
We need to carry out some preprocessing steps.
First, we need to standardize `X` and mean centering of standardized `X`.
` X_\text{std}=\frac{X-\bar{X}}{\sigma_X}`
`Z=X_\text{std}-\bar{X_\text{std}}`
We store the value of mean centered standardized `X` into `Z`.
We finish the preprocessing steps after completing these two steps.
Now, We proceed to estimate the variance-covariance matrix of `Z`.
`C = \frac{1}{n}Z^TZ`
`C` is the matrix containing variance and covariance of `Z`. It is a square matrix of `n \times n` in our example.
Furthermore, we estimate the eigenvalues and eigenvectors following spectral decomposition. We must be familiar with a popular technique called spectral decomposition of matrix or eigen decomposition of matrix. The spectral decomposition of matrix decomposes matrix into eigen value and eigen vector.
` C = \Phi \Lambda \Phi^{-1}`
The spectral decomposition of `C` - where `C` must be non-defective square matrix - decomposes `C` into `\Phi`, the matrix of eigenvectors, and `\Lambda`, a diagonal matrix containing eigenvalues.
Now, we need to order eigenvalues in descending order and order eigenvectors according to eigenvalues.
We need to obtain principal component.
` PC = Z\Phi^*`
Here, `\Phi^*` is the eigenvectors corresponding to eigenvalues arranged in descending order.
Now, factor loadings are obtained by multiplying eigenvectors `(\Phi^*)` by the square root of corresponding eigenvalues.
`\text{Loadings} = \sqrt{\text{eigenvalues}} \times \text{eigenvectors}`
0 تعليقات