The Ultimate Data Science Tutorial — Deep Theory, Real-World Applications & Step-by-Step Solutions
Imagine you have a rubber sheet with a grid drawn on it. You apply a transformation — stretch it, shear it, rotate it. Most arrows drawn on the sheet change direction. But some special arrows only get longer or shorter along their original line. They refuse to rotate.
Those stubborn arrows are eigenvectors. The factor by which they stretch (or shrink) is the eigenvalue.
Think of an earthquake shaking a building. The building vibrates in certain natural modes — some floors sway left-right, others twist. Each mode is an eigenvector (the shape of vibration) and the frequency of that mode corresponds to an eigenvalue. The building "wants" to vibrate in these special directions.
The word "eigen" comes from German meaning "own" or "characteristic." So eigenvectors are the characteristic directions of a transformation — the directions that belong to that matrix.
Given a square matrix \(\mathbf{A}\) of size \(n \times n\), a scalar \(\lambda\) and a non-zero vector \(\mathbf{v}\) satisfy:
where:
If \(\mathbf{v}\) is an eigenvector, then any scalar multiple \(c\mathbf{v}\) (where \(c \neq 0\)) is also an eigenvector with the same eigenvalue. That's why we often normalize eigenvectors to have length 1, or just pick a convenient representative.
Understanding what different eigenvalues look like geometrically is crucial for intuition:
| Eigenvalue \(\lambda\) | What Happens to \(\mathbf{v}\) | Visual |
|---|---|---|
| \(\lambda > 1\) | Stretched (gets longer) | → ———→ |
| \(\lambda = 1\) | Unchanged (fixed direction AND length) | → → |
| \(0 < \lambda < 1\) | Shrunk (gets shorter) | ———→ → |
| \(\lambda = 0\) | Collapsed to zero (projected out) | → · |
| \(\lambda < 0\) | Flipped (reversed direction) and scaled | → ← |
| \(\lambda = a + bi\) (complex) | Rotation + scaling (spiral) | → ↻ |
A 2×2 matrix has (at most) 2 eigenvectors. Think of them as the two axes along which the transformation acts purely as stretching. Every other vector is a mix of these two directions and will appear to rotate because its two components stretch by different amounts.
| Type | Meaning | Where You See It |
|---|---|---|
| Real, distinct | Clear, separate scaling directions | PCA, covariance matrices |
| Real, repeated | Uniform scaling in a subspace | Scalar multiples of identity |
| Complex conjugate pairs | Rotation + spiral behavior | Oscillating systems, control theory |
| Pure imaginary | Pure rotation, no growth/decay | Undamped oscillations |
If any eigenvalue is zero, the matrix is singular (non-invertible). The eigenvector for \(\lambda=0\) lies in the null space of \(\mathbf{A}\). This means \(\mathbf{A}\) collapses some dimension to zero — information is lost.
Vectors with eigenvalue 1 are unchanged by the transformation. In Markov chains, the steady-state vector has \(\lambda = 1\). In projections, the subspace being projected onto has \(\lambda = 1\).
In dynamical systems: eigenvalues with \(|\lambda| > 1\) cause exponential growth (unstable), while \(|\lambda| < 1\) cause exponential decay (stable). This is the foundation of stability analysis.
Algebraic multiplicity = how many times \(\lambda\) appears as a root of the characteristic polynomial.
Geometric multiplicity = number of linearly independent eigenvectors for that \(\lambda\) = dimension of eigenspace = \(\dim\ker(\mathbf{A} - \lambda\mathbf{I})\).
Always: \(1 \leq \text{geometric mult.} \leq \text{algebraic mult.}\)
This is perhaps the most important section. Eigenvalues and eigenvectors are not just an abstract math concept — they solve fundamental problems that appear across every scientific and engineering discipline.
A matrix can represent a complicated transformation — shearing, stretching, rotating all at once. Eigenvalues and eigenvectors decompose this mess into simple, independent stretching motions along specific axes. It's like taking a complex sound wave and decomposing it into individual pure frequencies (Fourier transform is deeply connected to eigenvalues!).
In data science, you often have hundreds or thousands of features. Eigenvalues tell you which directions matter most. If 3 eigenvalues are huge and 997 are tiny, your data effectively lives in a 3D subspace. PCA uses exactly this idea to reduce dimensions while preserving maximum information.
Computing \(\mathbf{A}^{100}\) directly requires 99 matrix multiplications. But if you know the eigendecomposition \(\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}\), then:
And \(\mathbf{D}^{100}\) is trivial — just raise each diagonal eigenvalue to the 100th power. This is how we solve Markov chains, recurrence relations, and differential equations efficiently.
In engineering and physics, the eigenvalues of a system matrix tell you whether the system is stable, unstable, or oscillatory:
This is the foundation of control theory — designing systems (autopilot, robotics, etc.) that behave reliably.
Systems of linear differential equations \(\frac{d\mathbf{x}}{dt} = \mathbf{A}\mathbf{x}\) have solutions of the form:
Each eigenvector \(\mathbf{v}_i\) is an independent mode of behavior, and each eigenvalue \(\lambda_i\) determines whether that mode grows, decays, or oscillates.
Google's PageRank models the web as a huge matrix where entry \((i,j)\) represents the probability of clicking from page \(j\) to page \(i\). The dominant eigenvector (eigenvalue = 1) of this matrix gives the importance ranking of every web page. This eigenvector is the steady-state of a random web surfer.
They let you: (1) understand the geometry of transformations, (2) compress data, (3) compute matrix powers efficiently, (4) determine if systems are stable, (5) solve differential equations, and (6) rank things. Almost every area of applied math reduces to an eigenvalue problem at some point.
Here's a practical decision guide — when should you reach for eigenvalue/eigenvector analysis?
| Situation | Signal to Use Eigen-Analysis | Technique |
|---|---|---|
| Too many features in your dataset | Need to reduce dimensions | PCA (eigenvectors of covariance matrix) |
| Understanding correlations in data | Covariance matrix is symmetric → guaranteed real eigenvalues | Eigendecomposition of \(\mathbf{\Sigma}\) |
| Random walker on a graph/network | Need steady-state probability | Eigenvector with \(\lambda=1\) of transition matrix |
| Grouping similar items with graph structure | Data has network/graph connections | Spectral clustering (eigenvectors of Laplacian) |
| Will this system blow up over time? | Studying a dynamical system | Check eigenvalues of system matrix |
| Solving \(\frac{d\mathbf{x}}{dt} = \mathbf{A}\mathbf{x}\) | System of linear ODEs | General solution via eigenvalues/vectors |
| Computing \(\mathbf{A}^n\) for large \(n\) | Recurrence relations, Markov chains | Diagonalization: \(\mathbf{A}^n = \mathbf{P}\mathbf{D}^n\mathbf{P}^{-1}\) |
| Recommending products to users | User-item matrix is huge and sparse | SVD / low-rank approximation |
| Image compression | Want to store less data with minimal loss | SVD (closely related to eigendecomposition) |
| NLP: understanding word relationships | Co-occurrence matrix is large | SVD for word embeddings (like LSA) |
| Vibration / structural analysis | Finding natural frequencies | Generalized eigenvalue problem |
| Quantum mechanics | Measuring observable quantities | Eigenvalues of Hermitian operators |
Whenever you see a square matrix that represents relationships, transitions, transformations, or correlations — eigenvalue analysis will likely reveal something important about it. If you're asking "what are the most important directions/modes/patterns?" — you want eigenvectors.
Problem: You have a dataset with 500 features. Training is slow and there's noise. Which features matter most?
How eigen helps: Compute the covariance matrix \(\mathbf{\Sigma}\) of your data. Its eigenvectors point in the directions of maximum variance (principal components). The eigenvalues tell you how much variance each direction explains. Keep only the top \(k\) eigenvectors to reduce 500 features to, say, 20 — losing almost no information.
Problem: Millions of users, millions of items, very sparse rating matrix. How to predict what a user will like?
How eigen helps: SVD (built on eigendecomposition) factors the user-item matrix into latent factors. The top eigenvalues/singular values capture the dominant "taste dimensions" — maybe 50 dimensions can represent all the important patterns in millions of ratings.
Problem: You have data where clusters have irregular shapes — k-means fails badly. But you know which points are "similar."
How eigen helps: Build a similarity graph. Compute the Laplacian matrix \(\mathbf{L} = \mathbf{D} - \mathbf{W}\). The smallest eigenvalues (near zero) of \(\mathbf{L}\) reveal cluster structure — the number of zero eigenvalues equals the number of connected components. The corresponding eigenvectors embed your data into a space where k-means works perfectly.
Problem: A term-document matrix is huge and sparse. Words with similar meanings should be grouped together.
How eigen helps: Latent Semantic Analysis uses SVD to find the hidden (latent) semantic structure. The top singular vectors capture topics — grouping synonyms and separating polysemy automatically.
Problem: Each face image is a 10,000-pixel vector. How to recognize faces efficiently?
How eigen helps: Apply PCA to a training set of faces. The eigenvectors of the covariance matrix form "eigenfaces" — ghostly base patterns. Any face can be approximated as a weighted sum of ~100 eigenfaces. Recognition becomes comparing 100 weights instead of 10,000 pixels.
Problem: Will this bridge resonate with wind? At what frequencies might this building collapse during an earthquake?
How eigen helps: The generalized eigenvalue problem \(\mathbf{K}\mathbf{v} = \omega^2 \mathbf{M}\mathbf{v}\) (stiffness and mass matrices) gives the natural frequencies (\(\omega\)) and mode shapes (eigenvectors) of the structure. Engineers design so that no natural frequency matches expected vibration sources.
Problem: What energy levels can an electron have in an atom? What states can a quantum system be in?
How eigen helps: Observable quantities (energy, momentum, spin) are represented by Hermitian operators. The eigenvalues are the possible measurement outcomes, and the eigenvectors are the corresponding quantum states. The famous Schrödinger equation is an eigenvalue problem:
where \(\hat{H}\) is the Hamiltonian operator, \(E\) is the energy eigenvalue, and \(\psi\) is the wave function (eigenvector).
Problem: Design a controller for a drone so it stays stable in wind.
How eigen helps: Model the drone dynamics as \(\dot{\mathbf{x}} = \mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{u}\). The eigenvalues of \(\mathbf{A}\) (or of the closed-loop matrix \(\mathbf{A}-\mathbf{B}\mathbf{K}\)) determine stability. Engineers place eigenvalues in the left half of the complex plane to ensure the system is stable and responsive.
Problem: Analyzing oscillation modes in a large power grid to prevent blackouts.
How eigen helps: The state matrix of the power system has eigenvalues that correspond to different oscillation modes. If any eigenvalue crosses into the right half-plane, the grid becomes unstable. Operators monitor eigenvalues in real-time.
Problem: Rank billions of web pages by importance.
How eigen helps: Model the web as a directed graph. The transition matrix \(\mathbf{M}\) describes a random surfer clicking links. The dominant eigenvector (for \(\lambda = 1\)) gives the long-run probability of being on each page — this IS the PageRank.
Problem: Find communities (friend groups, interest clusters) in a social network of millions of users.
How eigen helps: The eigenvectors of the modularity matrix reveal community structure. Nodes that share similar eigenvector components belong to the same community. This is how Facebook and Twitter detect user groups at scale.
Problem: You hold 100 stocks. How correlated are they? What hidden risk factors drive your portfolio?
How eigen helps: Eigen-analysis of the correlation matrix reveals the principal risk factors. The largest eigenvalue typically corresponds to the "market factor" (all stocks move together). Smaller eigenvalues reveal sector-specific risks. This is the foundation of factor models (like Fama-French).
Problem: How does a change in one industry's output affect the entire economy?
How eigen helps: Leontief's input-output model uses eigenvalues of the technology matrix to determine if an economy can sustain itself. The dominant eigenvalue (called the Perron-Frobenius eigenvalue) must be less than 1 for the economy to be productive.
Problem: Will this animal species grow, decline, or stabilize over time?
How eigen helps: The Leslie matrix encodes birth rates and survival rates for each age group. Its dominant eigenvalue \(\lambda_1\) determines long-term behavior: if \(\lambda_1 > 1\), population grows; if \(\lambda_1 < 1\), it declines; if \(\lambda_1 = 1\), it stabilizes. The eigenvector gives the stable age distribution.
Problem: Gene expression data has 20,000+ genes. Which genes distinguish cancer subtypes?
How eigen helps: PCA on gene expression matrices reveals the dominant patterns. The first few principal components (eigenvectors) often correspond to biological processes like cell cycle, immune response, or tissue type. This is used for cancer subtype classification and drug response prediction.
Problem: Will a disease become an epidemic? How fast will it spread?
How eigen helps: The basic reproduction number \(R_0\) is the dominant eigenvalue of the next-generation matrix. If \(R_0 > 1\), the disease spreads; if \(R_0 < 1\), it dies out. This was crucial for COVID-19 modeling.
Problem: A 1000×1000 grayscale image has 1,000,000 pixel values. Store it with much less data.
How eigen helps: SVD decomposes the image matrix into singular values. Keep only the top \(k\) singular values (and corresponding vectors). With \(k=50\), you reduce storage by ~90% with barely noticeable quality loss. Each singular value tells you how much "information" that component adds.
Problem: Decompose a complex 3D deformation into rotation + stretching.
How eigen helps: The polar decomposition (which uses eigenvalues) separates any transformation into a pure rotation and a pure stretch. This is essential for physics-based animation, mesh deformation, and motion capture processing.
Every application above reduces to the same core idea: find the most important directions (eigenvectors) and their importance (eigenvalues) in a system described by a matrix. The matrix might represent data correlations, network connections, physical forces, transition probabilities, or quantum states — but the math is the same.
Starting from \(\mathbf{A}\mathbf{v} = \lambda\,\mathbf{v}\), we derive the method:
For \(\mathbf{v} \neq \mathbf{0}\) to exist, the matrix \((\mathbf{A} - \lambda\mathbf{I})\) must be singular:
This is a polynomial of degree \(n\) in \(\lambda\). Its roots are the eigenvalues. This polynomial is called the characteristic polynomial.
For \(\mathbf{A} = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\), the characteristic equation is always:
$$\lambda^2 - (a+d)\lambda + (ad - bc) = 0$$That is: \(\lambda^2 - \text{trace}\cdot\lambda + \det = 0\). You can use the quadratic formula directly!
For each eigenvalue \(\lambda_i\), solve:
This is a homogeneous system. Use Gaussian elimination (row reduction) on the matrix \((\mathbf{A} - \lambda_i\mathbf{I})\) to find the free variables, then express the solution in terms of those free variables.
The set of ALL eigenvectors for a given \(\lambda\) (plus the zero vector) forms a subspace called the eigenspace \(E_\lambda\). Its dimension equals the geometric multiplicity of \(\lambda\).
$$E_\lambda = \ker(\mathbf{A} - \lambda\mathbf{I}) = \{\mathbf{v} : (\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}\}$$Problem: Find the eigenvalues and eigenvectors of:
Quick check: \(\lambda_1 + \lambda_2 = 7 = 4+3 = \text{trace}(\mathbf{A})\) ✓ and \(\lambda_1 \cdot \lambda_2 = 10 = 4\cdot3 - 1\cdot2 = \det(\mathbf{A})\) ✓
Row 1: \(-v_1 + v_2 = 0 \implies v_2 = v_1\). Let \(v_1 = 1\):
Row 1: \(2v_1 + v_2 = 0 \implies v_2 = -2v_1\). Let \(v_1 = 1\):
Problem: Find the eigenvalues and eigenvectors of:
Expanding along the first row:
\(v_1 = 0\), \(v_2 = v_3\). Choose \(v_3 = 1\):
\(v_2 = -v_3\), \(v_1\) is free. Two free variables → two independent eigenvectors:
Geometric multiplicity = 2 = algebraic multiplicity. ✓ (Matrix is diagonalizable!)
Problem: You measured height and weight of 100 students. The covariance matrix is:
Find the principal components and determine how much you can compress the data.
For \(\lambda_1 \approx 5.56\): solving \((\mathbf{C} - 5.56\mathbf{I})\mathbf{v} = \mathbf{0}\):
Normalized: \(\mathbf{v}_1 \approx \begin{pmatrix} 0.79 \\ 0.62 \end{pmatrix}\) — this points roughly 38° from the height axis.
For \(\lambda_2 \approx 1.44\): the eigenvector is perpendicular (symmetric matrix!):
\(\mathbf{v}_2 \approx \begin{pmatrix} -0.62 \\ 0.79 \end{pmatrix}\)
PC1 (eigenvector 1): A combined "body size" factor — when height increases, weight tends to increase proportionally along this direction.
PC2 (eigenvector 2): The "body shape" factor — variation perpendicular to the main trend (tall-thin vs short-heavy deviation).
Conclusion: By projecting onto PC1 alone, you capture 79.4% of the variance — reducing 2D data to 1D while losing only 20.6% of information.
Problem: A customer is either Happy (H) or Unhappy (U). Each month: 80% of happy customers stay happy, 20% become unhappy. 60% of unhappy customers become happy, 40% stay unhappy. What's the long-run distribution?
Columns sum to 1 (it's a stochastic matrix). Each column represents "where do people in this state go next?"
\(\lambda_1 = 1\) (guaranteed for stochastic matrices!) and \(\lambda_2 = 0.2\).
\(-0.2v_1 + 0.6v_2 = 0 \implies v_1 = 3v_2\). Choose \(v_2 = 1\): \(\mathbf{v} = \begin{pmatrix} 3 \\ 1 \end{pmatrix}\)
Normalize to sum to 1 (probability): \(\boldsymbol{\pi} = \begin{pmatrix} 3/4 \\ 1/4 \end{pmatrix} = \begin{pmatrix} 0.75 \\ 0.25 \end{pmatrix}\)
In the long run, 75% of customers are happy and 25% are unhappy, regardless of the initial distribution. The eigenvalue \(\lambda_2 = 0.2\) tells us the system converges to this steady state — the rate \(0.2^n \to 0\) means convergence is fast.
Problem: Solve the system of differential equations:
This is \(\dot{\mathbf{x}} = \mathbf{A}\mathbf{x}\), the same matrix from Example 1!
From Example 1: \(\lambda_1 = 5, \;\mathbf{v}_1 = \begin{pmatrix}1\\1\end{pmatrix}\) and \(\lambda_2 = 2, \;\mathbf{v}_2 = \begin{pmatrix}1\\-2\end{pmatrix}\)
Written out:
Both eigenvalues are positive (\(\lambda = 5, 2\)), so this is an unstable node — both modes grow exponentially. The \(e^{5t}\) mode dominates for large \(t\), so the solution eventually aligns with eigenvector \(\begin{pmatrix}1\\1\end{pmatrix}\) — both \(x\) and \(y\) grow at equal rates.
If the eigenvalues were negative, the system would decay to zero (stable). If one were positive and one negative, we'd get a saddle point.
If an \(n \times n\) matrix \(\mathbf{A}\) has \(n\) linearly independent eigenvectors, we can write:
where \(\mathbf{P}\) has eigenvectors as columns and \(\mathbf{D}\) is diagonal with eigenvalues:
$$\mathbf{P} = \begin{pmatrix} | & | & & | \\ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \\ | & | & & | \end{pmatrix}, \quad \mathbf{D} = \begin{pmatrix} \lambda_1 & & \\ & \lambda_2 & \\ & & \ddots & \\ & & & \lambda_n \end{pmatrix}$$Computing \(\mathbf{A}^{1000}\) reduces to computing \(\lambda_i^{1000}\) — scalar exponentiation!
Not every matrix can be diagonalized! If the geometric multiplicity is less than the algebraic multiplicity for some eigenvalue, we need the Jordan Normal Form instead. However, symmetric matrices are always diagonalizable — and most matrices in data science are symmetric (covariance, correlation, Laplacian).
SVD (Singular Value Decomposition) and eigendecomposition are closely related but have important differences:
| Feature | Eigendecomposition | SVD |
|---|---|---|
| Formula | \(\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}\) | \(\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T\) |
| Matrix shape | Square only (\(n \times n\)) | Any shape (\(m \times n\)) |
| Always exists? | Not always (need n independent eigenvectors) | Always exists for any matrix |
| Values | Eigenvalues (can be negative, complex) | Singular values (always non-negative real) |
| Relationship | Eigenvalues of \(\mathbf{A}\) | Singular values = \(\sqrt{\text{eigenvalues of } \mathbf{A}^T\mathbf{A}}\) |
| Key use in DS | PCA, Markov chains, spectral methods | Recommendation systems, NLP, image compression |
For a symmetric matrix \(\mathbf{A}\): eigendecomposition and SVD are the same thing (up to sign). The singular values equal the absolute values of the eigenvalues.
For a general matrix \(\mathbf{A}\): the left singular vectors are eigenvectors of \(\mathbf{A}\mathbf{A}^T\), and the right singular vectors are eigenvectors of \(\mathbf{A}^T\mathbf{A}\).
One of the most important theorems in linear algebra, and the theoretical backbone of PCA:
If \(\mathbf{A}\) is a real symmetric matrix (\(\mathbf{A} = \mathbf{A}^T\)), then:
This is why PCA works so cleanly — covariance matrices are symmetric, so their eigenvectors form a perfect orthogonal coordinate system.
A symmetric matrix can be written as a sum of rank-1 matrices:
Each term \(\lambda_i \mathbf{v}_i\mathbf{v}_i^T\) is a projection onto one eigenvector, weighted by its eigenvalue. This form is directly used in PCA: keep the terms with the largest \(\lambda_i\) and discard the rest.
Eigenvalues and eigenvectors are connected to many other concepts. Here's how they all fit together:
| Concept | Connection to Eigenvalues/Eigenvectors |
|---|---|
| Determinant | \(\det(\mathbf{A}) = \prod \lambda_i\). If any \(\lambda = 0\), the determinant is 0 and the matrix is singular. |
| Trace | \(\text{tr}(\mathbf{A}) = \sum \lambda_i\). The trace equals the sum of eigenvalues. |
| Rank | Rank = number of non-zero eigenvalues. A rank-deficient matrix has eigenvalue 0. |
| Inverse | Eigenvalues of \(\mathbf{A}^{-1}\) are \(1/\lambda_i\). Exists only if all \(\lambda_i \neq 0\). |
| Null Space | The null space is the eigenspace for \(\lambda = 0\). |
| Positive Definite | A symmetric matrix is positive definite iff ALL eigenvalues > 0. (Covariance matrices are positive semi-definite: all \(\lambda \geq 0\).) |
| Condition Number | \(\kappa(\mathbf{A}) = |\lambda_{\max}|/|\lambda_{\min}|\). Large condition number = numerically unstable. |
| Matrix Norm | The spectral norm \(\|\mathbf{A}\|_2 = \sigma_{\max}\) (largest singular value). For symmetric \(\mathbf{A}\): equals \(|\lambda_{\max}|\). |
| Fourier Transform | The DFT matrix has eigenvectors that are complex exponentials. Fourier analysis IS eigenvalue analysis of circulant matrices. |
| Cayley-Hamilton | Every matrix satisfies its own characteristic equation: if \(p(\lambda) = 0\) is the characteristic polynomial, then \(p(\mathbf{A}) = \mathbf{0}\). |
Find eigenvalues and eigenvectors of: \(\mathbf{A} = \begin{pmatrix} 3 & 0 \\ 0 & 7 \end{pmatrix}\)
For diagonal matrices, eigenvalues are just the diagonal entries!
Find eigenvalues and eigenvectors of: \(\mathbf{B} = \begin{pmatrix} 1 & 2 \\ 2 & 1 \end{pmatrix}\)
Characteristic equation: \(\lambda^2 - 2\lambda - 3 = 0 \implies (\lambda-3)(\lambda+1)=0\)
Note the eigenvectors are orthogonal (dot product = 0), as guaranteed by the Spectral Theorem for symmetric matrices!
A particle moves between states A and B. From A: 70% stay, 30% go to B. From B: 50% stay, 50% go to A. Find the steady-state distribution.
Transition matrix: \(\mathbf{P} = \begin{pmatrix}0.7 & 0.5 \\ 0.3 & 0.5\end{pmatrix}\)
For \(\lambda = 1\): \((\mathbf{P}-\mathbf{I})\mathbf{v}=\mathbf{0}\) gives \(-0.3v_1 + 0.5v_2 = 0 \implies v_1 = \frac{5}{3}v_2\).
Normalizing: \(\boldsymbol{\pi} = \begin{pmatrix}5/8 \\ 3/8\end{pmatrix} = \begin{pmatrix}0.625 \\ 0.375\end{pmatrix}\)
Long run: 62.5% in state A, 37.5% in state B.
Find eigenvalues of: \(\mathbf{C} = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\)
\(\lambda^2 + 1 = 0 \implies \lambda = \pm i\) (pure imaginary!)
This is a 90° rotation matrix. No real vector remains on its line after rotation, which is why eigenvalues are complex. Pure imaginary eigenvalues = pure rotation (no growth or decay) = undamped oscillation.
Given \(\mathbf{A}\mathbf{v} = \lambda\mathbf{v}\), prove that \(\mathbf{A}^2\mathbf{v} = \lambda^2\mathbf{v}\).
By induction, this generalizes to \(\mathbf{A}^k\mathbf{v} = \lambda^k\mathbf{v}\) for all positive integers \(k\).
Find eigenvalues and eigenvectors of: \(\mathbf{A} = \begin{pmatrix} 5 & 4 & 2 \\ 4 & 5 & 2 \\ 2 & 2 & 2 \end{pmatrix}\)
The characteristic polynomial is \(-\lambda^3 + 12\lambda^2 - 21\lambda + 10 = 0\), which factors as \(-(\lambda-10)(\lambda-1)^2 = 0\).
Eigenvalues: \(\lambda_1 = 10\), \(\lambda_2 = 1\) (multiplicity 2).
For \(\lambda = 10\): \(\mathbf{v} = \begin{pmatrix}2\\2\\1\end{pmatrix}\)
For \(\lambda = 1\): Two independent eigenvectors: \(\mathbf{v}_a = \begin{pmatrix}1\\-1\\0\end{pmatrix}\), \(\mathbf{v}_b = \begin{pmatrix}-1\\0\\1\end{pmatrix}\) (any two independent vectors in the 2D eigenspace).
Since geometric = algebraic multiplicity for both, the matrix is diagonalizable. And since \(\mathbf{A}\) is symmetric, the eigenvectors are orthogonal!
The zero vector \(\mathbf{v} = \mathbf{0}\) always satisfies \(\mathbf{A}\mathbf{v} = \lambda\mathbf{v}\) for any \(\lambda\). But it's NOT an eigenvector! Eigenvectors must be non-zero.
\(\lambda = 0\) IS a valid eigenvalue. It means the eigenvector gets mapped to the zero vector: \(\mathbf{A}\mathbf{v} = \mathbf{0}\). This happens when \(\mathbf{A}\) is singular.
It's \(\det(\mathbf{A} - \lambda\mathbf{I}) = 0\), NOT \(\det(\mathbf{A}) - \lambda = 0\). The \(\lambda\) goes on the diagonal through the identity matrix, then you take the determinant of the whole thing.
The matrix \(\begin{pmatrix}1 & 1\\0 & 1\end{pmatrix}\) has \(\lambda = 1\) (double), but only one eigenvector \(\begin{pmatrix}1\\0\end{pmatrix}\). It's NOT diagonalizable. You need Jordan form.
Only square matrices have eigenvalues and eigenvectors. For non-square matrices, use SVD instead.
Any scalar multiple of an eigenvector is also an eigenvector. The direction matters, not the length. That's why we often normalize to unit length.
# === Basic Eigenvalue Computation === import numpy as np A = np.array([[4, 1], [2, 3]]) # Get eigenvalues and eigenvectors eigenvalues, eigenvectors = np.linalg.eig(A) print("Eigenvalues:", eigenvalues) # Output: [5. 2.] print("Eigenvectors (columns):") print(eigenvectors) # === For Symmetric Matrices (Faster & More Stable) === C = np.array([[4, 2], [2, 3]]) eigenvalues, eigenvectors = np.linalg.eigh(C) # 'h' = Hermitian/symmetric # Returns sorted eigenvalues and orthonormal eigenvectors # === PCA from Scratch === X = np.random.randn(100, 5) # 100 samples, 5 features X_centered = X - X.mean(axis=0) cov_matrix = np.cov(X_centered.T) evals, evecs = np.linalg.eigh(cov_matrix) # Sort by descending eigenvalue idx = np.argsort(evals)[::-1] evals = evals[idx] evecs = evecs[:, idx] # Variance explained variance_ratio = evals / evals.sum() print("Variance explained:", variance_ratio) # Project onto top 2 components X_pca = X_centered @ evecs[:, :2] # === Verify: A @ v = lambda * v === for i in range(len(eigenvalues)): lhs = A @ eigenvectors[:, i] rhs = eigenvalues[i] * eigenvectors[:, i] print(f"Check λ={eigenvalues[i]:.2f}:", np.allclose(lhs, rhs)) # Should print True # === SVD === U, S, Vt = np.linalg.svd(A) # S contains singular values # For symmetric A: singular values = |eigenvalues| # === Useful: Condition Number === cond = np.linalg.cond(A) # = |λ_max| / |λ_min|. Large = numerically unstable
"The theory of eigenvalues is one of the great achievements of mathematics. Virtually every branch of science and engineering uses it."
Master eigenvalues and eigenvectors, and PCA, spectral methods, differential equations, and dynamical systems will feel natural.
© 2026 Sim Vattanac. All rights reserved.