A Lagrange Multiplier Approach to PCA

Learn how Principal Component Analysis uses Lagrange multipliers to maximize data variance.

PCA Cover

Principal Component Analysis (PCA) aims to find the directions (principal components) that maximize the variance of the projected data while ensuring that these directions are orthogonal and of unit length. This optimization problem can be elegantly solved using the method of Lagrange multipliers.

In this derivation, we'll:

  1. Set up the optimization problem: Maximize the variance of the projected data.
  2. Introduce the Lagrangian: Incorporate the constraints using Lagrange multipliers.
  3. Derive the eigenvalue equation: Show that maximizing variance leads to solving an eigenvalue problem.

1. Setting Up the Optimization Problem

Objective: Find a vector w\mathbf{w} (principal component) that maximizes the variance of the projected data wX\mathbf{w}^\top \mathbf{X}.

Given:

Constraint:

2. Formulating the Lagrangian

Let’s say we have objective function f(x)f(x) to maximize, subject to below euqality and non equality constarints

The Lagrangian L(x,λ,μ)L(x, \lambda, \mu) incorporating both constraints is defined as:

L(x,λ,μ)=f(x)+λ{g(x)c}+μ{h(x)d}L(x, \lambda, \mu) = f(x) + \lambda \{g(x) - c\} + \mu \{h(x) - d\}

where:

To maximize wCw\mathbf{w}^\top \mathbf{C} \mathbf{w} subject to ww=1\mathbf{w}^\top \mathbf{w} = 1, we introduce a Lagrange multiplier λ\lambda and construct the Lagrangian function:

L(w,λ)=wCwλ(ww1)\mathcal{L}(\mathbf{w}, \lambda) = \mathbf{w}^\top \mathbf{C} \mathbf{w} - \lambda (\mathbf{w}^\top \mathbf{w} - 1)

3. Deriving the Eigenvalue Equation

Compute the Gradient of the Lagrangian with respect to w\mathbf{w} and Set it to Zero:

Lw=0\frac{\partial \mathcal{L}}{\partial \mathbf{w}} = 0

Calculate the Partial Derivative:

  1. Derivative of the Objective Function:

    w(wCw)=2Cw\frac{\partial}{\partial \mathbf{w}} (\mathbf{w}^\top \mathbf{C} \mathbf{w}) = 2 \mathbf{C} \mathbf{w}
  2. Derivative of the Constraint Term:

    w[λ(ww1)]=2λw\frac{\partial}{\partial \mathbf{w}} [ -\lambda (\mathbf{w}^\top \mathbf{w} - 1) ] = -2 \lambda \mathbf{w}

Set the Gradient to Zero:

Cw2λw=0\mathbf{C} \mathbf{w} - 2 \lambda \mathbf{w} = \mathbf{0}

Simplify by dividing both sides by 2:

Cw=λw\mathbf{C} \mathbf{w} = \lambda \mathbf{w}

Interpretation:

By decomposing the covariance matrix, PCA transforms the correlated variables into a new set of uncorrelated variables (principal components) ordered by the amount of variance they capture.

4. Finding Principal Components

PCA seeks directions (principal components) along which the variance of the data is maximized. The covariance matrix's eigenvalues and eigenvectors reveal these directions.

Eigenvalues and Eigenvectors:

Ordering Principal Components:

By choosing eigenvectors corresponding to the largest eigenvalues, you reduce the dimensionality while preserving as much information as possible.