Appendix I — Eigenvalue Problems

Given a square matrix \(A \in \mathbb{C}^{N \times N}\), the Eigenvalue Problem consists of finding a scalar \(\lambda \in \mathbb{C}\) and a vector \(\boldsymbol{v} \neq \boldsymbol{0}\) such that \(A\boldsymbol{v}=\lambda\boldsymbol{v}\). Any such \(\lambda\) is called an Eigenvalue of \(A\), while \(\boldsymbol{v}\) is the associated Eigenvector. For any matrix \(A\) and its eigenvalue \(\lambda\), the associated eigenvector is not unique; in fact, any multiple of an eigenvector is still an eigenvector. The eigenvalue/eigenvector pair will be written in Eigenpair notation as \(\left\{ \lambda ; \boldsymbol{v} \right\}.\)

In order to calculate the eigenvalues of a matrix \(A \in \mathbb{C}^{N \times N}\), consider the polynomial \[p(\lambda)=\det(A-\lambda \mathcal{I}).\] This will be a polynomial of degree \(N\), in fact, any root of the polynomial \(p(\lambda)\) is an eigenvalue of \(A\) and vice versa. Note that if the highest order coefficient of \(p\) is equal to 1, then the polynomial is known as the Characteristic Polynomial of \(\boldsymbol{A}\). More generally, for any matrix \(A \in \mathbb{C}^{N \times N}\), the characteristic polynomial is given by \(P(\lambda)=(-1)^N \det(A-\lambda \mathcal{I})\). This means that the matrix \(A\) of size \(N \times N\) must have \(N\) eigenvalues (not necessarily unique). Also, if \(A\) is a real matrix, the polynomial \(p(\lambda)\) will have real coefficients and therefore (by the Fundamental Theorem of Algebra), any complex eigenvalues will appear in complex conjugate pairs. If \(A\) is a diagonal or triangular matrix, then the eigenvalues are simply the diagonal terms. After the eigenvalues have been found, the eigenvectors can be calculated by finding a general form of the vector \(\boldsymbol{v}\) that satisfies \((A-\lambda I)\boldsymbol{v}=\boldsymbol{0}\).

If the eigenvector \(\boldsymbol{v}\) is known, the eigenvalue \(\lambda\) can be recovered by using the Rayleigh Quotient \[\lambda=\frac{{\boldsymbol{v}}^{\mathrm{H}}A\boldsymbol{v}}{\|\boldsymbol{v}\|_2^2}\] where \({\boldsymbol{v}}^{\mathrm{H}}={\boldsymbol{v}}^{\mathrm{T}}\) is the Hermitian of \(\boldsymbol{v}\) (the complex conjugate transpose).

Caution

Let \[ A = \begin{pmatrix} 0 & -1\\ 1 & 0 \end{pmatrix}.\] To find the eigenvalues, first consider the polynomial \[p(\lambda)=\det(A-\lambda I)=\det \left( \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}-\lambda \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \right)=\det \begin{pmatrix} -\lambda & -1 \\ 1 & -\lambda \end{pmatrix}=\lambda^2+1.\] This polynomial has two roots, \(\lambda_1=\mathrm{i}\) and \(\lambda_2=-\mathrm{i}\), hence giving the two eigenvalues of \(A\).

To calculate the eigenvectors, consider the eigenvalues separately, then for each eigenvalue, find the vector \(\boldsymbol{v}={(V_1 \; \; V_2)}^{\mathrm{T}}\) that satisfies \((A-\lambda I)\boldsymbol{v}=\boldsymbol{0}\):

  • \(\lambda_1=\mathrm{i}\): \[(A-\lambda_1 I)\boldsymbol{v}=\boldsymbol{0} \quad \implies \quad\begin{pmatrix} -\mathrm{i}& -1 \\ 1 & -\mathrm{i} \end{pmatrix}\begin{pmatrix} V_1 \\ V_2 \end{pmatrix}=\begin{pmatrix} 0 \\ 0 \end{pmatrix} \quad \implies \quad\begin{matrix} -\mathrm{i}V_1 - V_2=0 \\ V_1-\mathrm{i}V_2=0 \end{matrix}\] which gives two equations in two unknowns. However, notice that if the first equation is multiplied by \(\mathrm{i}\), the second equation will be obtained and therefore, the problem is underdetermined (i.e. one equation in two unknowns). This must always be the case, finding an eigenvector must always result in an underdetermined system. In this case, solving one equation would suffice. Solving the second equation will give \(V_1\) in terms of \(V_2\) as \(V_1=\mathrm{i}V_2\). Therefore the eigenvector \(\boldsymbol{v}\) will be \[\boldsymbol{v}=\begin{pmatrix} V_1 \\ V_2 \end{pmatrix}=\begin{pmatrix} \mathrm{i}V_2 \\ V_2 \end{pmatrix}=\begin{pmatrix} \mathrm{i}\\ 1 \end{pmatrix}V_2.\] Now any value of \(V_2\) can be chosen (except 0), and the result will be the eigenvector (this also shows why any multiple of an eigenvector is also an eigenvector), in this case, choose \(V_2=1\). This gives the first eigenpair \[\left\{ \mathrm{i} \; ; \; \begin{pmatrix} \mathrm{i}\\ 1 \end{pmatrix} \right\}.\]
  • \(\lambda_2=-\mathrm{i}\): \[(A-\lambda_2 I)\boldsymbol{v}=\boldsymbol{0} \quad \implies \quad\begin{pmatrix} \mathrm{i}& -1 \\ 1 & \mathrm{i} \end{pmatrix}\begin{pmatrix} V_1 \\ V_2 \end{pmatrix}=\begin{pmatrix} 0 \\ 0 \end{pmatrix} \quad \implies \quad\begin{matrix} \mathrm{i}V_1 - V_2=0 \\ V_1+\mathrm{i}V_2=0 \end{matrix}\] which gives one equations in two unknowns (since the first equation multiplied by \(-\mathrm{i}\) gives the second). Solving the second equation will give \(V_1\) in terms of \(V_2\) as \(V_1=-\mathrm{i}V_2\). Therefore the eigenvector \(\boldsymbol{v}\) will be \[\boldsymbol{v}=\begin{pmatrix} V_1 \\ V_2 \end{pmatrix}=\begin{pmatrix} -\mathrm{i}V_2 \\ V_2 \end{pmatrix}=\begin{pmatrix} -\mathrm{i}\\ 1 \end{pmatrix}V_2.\] For the sake of simplicity, choose \(V_2=\mathrm{i}\) (once again, any non-zero value of \(V_2\) can be chosen). This gives the second eigenpair \[\left\{ -\mathrm{i} \; ; \; \begin{pmatrix} 1 \\ \mathrm{i} \end{pmatrix} \right\}.\]

Therefore, the matrix has the eigenpairs \[\left\{ \mathrm{i} \; ; \; \begin{pmatrix} \mathrm{i}\\ 1 \end{pmatrix} \right\}, \quad \left\{ -\mathrm{i} \; ; \; \begin{pmatrix} 1 \\ \mathrm{i} \end{pmatrix} \right\}.\] This can be verified by showing that \(A\boldsymbol{v}=\lambda \boldsymbol{v}\) for each eigenpair: \[A \boldsymbol{v}_1=\begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\begin{pmatrix} \mathrm{i}\\ 1 \end{pmatrix}=\begin{pmatrix} -1 \\ \mathrm{i} \end{pmatrix} = \mathrm{i}\begin{pmatrix} \mathrm{i}\\ 1 \end{pmatrix}=\lambda_1 \boldsymbol{v}_1\] \[A \boldsymbol{v}_2=\begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\begin{pmatrix} 1 \\ \mathrm{i} \end{pmatrix}=\begin{pmatrix} -\mathrm{i}\\ 1 \end{pmatrix} = -\mathrm{i}\begin{pmatrix} 1 \\ \mathrm{i} \end{pmatrix}=\lambda_2 \boldsymbol{v}_2.\]

For a matrix \(A \in \mathbb{C}^{N \times N}\), there will always be \(N\) eigenvalues (not necessarily distinct). If an eigenvalue is repeated, then the same eigenvalue will have multiple eigenvectors however, it is possible that there might not necessarily be a total of \(N\) eigenvectors.

If the matrix \(A\) has a complete set of eigenvectors (meaning it has \(N\) distinct eigenvectors), then \(A\) is said to be Diagonalisable, i.e. there exists a non-singular matrix \(V \in \mathbb{C}^{N\times N}\) whose columns are the eigenvectors of \(A\) and a diagonal matrix \(\Lambda \in \mathbb{C}^{N \times N}\) whose entries are the eigenvalues of \(A\), such that \(A=V\Lambda V^{-1}\). Note that the order in which the eigenvalues and eigenvectors are placed in columns should be the same in both matrices, in other words, if the matrix \(A\) has \(N\) eigenpairs given by \(\left\{ \lambda_1;\boldsymbol{v}_1 \right\}, \left\{ \lambda_2;\boldsymbol{v}_2 \right\}, \dots, \left\{ \lambda_N;\boldsymbol{v}_N \right\}\), then \[\Lambda=\begin{pmatrix} \lambda_1 \\ & \lambda_2 \\ && \ddots \\ &&& \lambda_N \end{pmatrix}, \quad V=\begin{pmatrix} \vdots & \vdots && \vdots \\ \boldsymbol{v}_1 & \boldsymbol{v}_2 & \cdots & \boldsymbol{v}_N \\ \vdots & \vdots & & \vdots \end{pmatrix}.\]

Caution

From the example above, the matrices \(\Lambda\) and \(V\) are \[\Lambda=\begin{pmatrix} \mathrm{i}& 0 \\ 0 & -\mathrm{i} \end{pmatrix}, \quad V=\begin{pmatrix} \mathrm{i}& 1\\ 1 & \mathrm{i} \end{pmatrix}.\] The matrix \(A\) is diagonalisable since the product \(V \Lambda V^{-1} V\) should give \(A\), indeed \[V \Lambda V^{-1}=\begin{pmatrix} \mathrm{i}& 1\\ 1 & \mathrm{i} \end{pmatrix}\begin{pmatrix} \mathrm{i}& 0 \\ 0 & -\mathrm{i} \end{pmatrix}\begin{pmatrix} \mathrm{i}& 1\\ 1 & \mathrm{i} \end{pmatrix}^{-1}=\frac{1}{\mathrm{i}^2-1}\begin{pmatrix} \mathrm{i}& 1\\ 1 & \mathrm{i} \end{pmatrix}\begin{pmatrix} \mathrm{i}& 0 \\ 0 & -\mathrm{i} \end{pmatrix}\begin{pmatrix} \mathrm{i}& -1 \\ -1 & \mathrm{i} \end{pmatrix}\] \[=-\frac{1}{2}\begin{pmatrix} \mathrm{i}& 1\\ 1 & \mathrm{i} \end{pmatrix}\begin{pmatrix} -1 & -\mathrm{i}\\ \mathrm{i}& 1 \end{pmatrix}=\frac{1}{2}\begin{pmatrix} 0 & 2 \\ -2 & 0 \end{pmatrix}=\begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}=A.\]

Note that the existence of a complete system of eigenvectors is helpful in representing a linear transformation (or equivalently a square matrix) of a Euclidean space, such as \(\mathbb{R}^N\), as a simple dilation or scaling (i.e. a multiplication by a suitable factor along each coordinate axis) in a suitable system of coordinates, obtained from the original one by a volume-preserving linear map.

If the matrix \(A\) is Hermitian, meaning that \({A}^{\mathrm{H}}=A\), (this happens to be the case in many important applications, then the eigenvalue problem is much simpler since the following properties hold:

Note

For example, in the numerical approximation of solutions of boundary value problems for second-order differential equations describing ``conservative” physical processes, i.e. those where there is no dissipation of energy or it is very weak and can be neglected in the first instance.

!!! Let \(\left\{ \lambda;\boldsymbol{v} \right\}\) be an eigenpair of \(A\), i.e. \(A\boldsymbol{v}=\lambda\boldsymbol{v}\), then \[\begin{align*} & A \boldsymbol{v}=\lambda \boldsymbol{v} \\ \quad \underset{{\boldsymbol{v}}^{\mathrm{H}} \times}{\Rightarrow} \quad & {\boldsymbol{v}}^{\mathrm{H}} A \boldsymbol{v}=\lambda {\boldsymbol{v}}^{\mathrm{H}} \boldsymbol{v} \\ \quad \underset{{A}^{\mathrm{H}}=A}{\Rightarrow} \quad & {\boldsymbol{v}}^{\mathrm{H}} {A}^{\mathrm{H}} \boldsymbol{v}=\lambda {\boldsymbol{v}}^{\mathrm{H}}\boldsymbol{v} \\ \quad \underset{{(A\boldsymbol{v})}^{\mathrm{H}}={\boldsymbol{v}}^{\mathrm{H}}{A}^{\mathrm{H}}}{\Rightarrow} \quad & {(A\boldsymbol{v})}^{\mathrm{H}} \boldsymbol{v}=\lambda {\boldsymbol{v}}^{\mathrm{H}}\boldsymbol{v} \\ \quad \underset{A\boldsymbol{v}=\lambda \boldsymbol{v}}{\Rightarrow} \quad & {(\lambda \boldsymbol{v})}^{\mathrm{H}} \boldsymbol{v}=\lambda {\boldsymbol{v}}^{\mathrm{H}}\boldsymbol{v} \\ \quad \underset{{(\alpha \boldsymbol{u})}^{\mathrm{H}}=\bar{\alpha} {\boldsymbol{u}}^{\mathrm{H}}}{\Rightarrow} \quad & \bar{\lambda} {\boldsymbol{v}}^{\mathrm{H}} \boldsymbol{v}=\lambda {\boldsymbol{v}}^{\mathrm{H}}\boldsymbol{v} \\ \quad \underset{{\boldsymbol{u}}^{\mathrm{H}}\boldsymbol{u}=\| \boldsymbol{u} \|_2^2}{\Rightarrow} \quad & \bar{\lambda} \| \boldsymbol{v} \|_2^2=\lambda \| \boldsymbol{v} \|_2^2 \\ \quad \underset{\div \| \boldsymbol{v} \|_2^2 \text{ since } \boldsymbol{v} \neq \boldsymbol{0}}{\Rightarrow} \quad & \bar{\lambda}=\lambda \quad \implies \quad\lambda \in \mathbb{R}. \end{align*}\]

Let \(\left\{ \lambda;\boldsymbol{v} \right\}\) and \(\left\{ \mu;\boldsymbol{u} \right\}\) be real eigenpairs of \(A\) where \(\lambda \neq \mu\), i.e. \(A\boldsymbol{v}=\lambda \boldsymbol{v}\) and \(A\boldsymbol{u}=\mu \boldsymbol{v}\). Then \[\begin{align*} & A\boldsymbol{u}=\mu \boldsymbol{u} \\ \quad \underset{{\boldsymbol{v}}^{\mathrm{T}} \times}{\Rightarrow} \quad & {\boldsymbol{v}}^{\mathrm{T}} A \boldsymbol{u}=\mu {\boldsymbol{v}}^{\mathrm{T}} \boldsymbol{u} \\ \quad \underset{{(A\boldsymbol{v})}^{\mathrm{T}}={\boldsymbol{v}}^{\mathrm{T}} {A}^{\mathrm{T}}}{\Rightarrow} \quad & {(A\boldsymbol{v})}^{\mathrm{T}} \boldsymbol{u}=\mu {\boldsymbol{v}}^{\mathrm{T}} \boldsymbol{u} \\ \quad \underset{A\boldsymbol{v}=\lambda \boldsymbol{v}}{\Rightarrow} \quad & \lambda {\boldsymbol{v}}^{\mathrm{T}} \boldsymbol{u}=\mu {\boldsymbol{v}}^{\mathrm{T}} \boldsymbol{u} \\ \implies & (\lambda - \mu){\boldsymbol{v}}^{\mathrm{T}}\boldsymbol{u}=0 \\ \quad \underset{\lambda \neq \mu}{\Rightarrow} \quad & {\boldsymbol{v}}^{\mathrm{T}}\boldsymbol{u}=0 \quad \implies \quad\text{$\boldsymbol{u}$ and $\boldsymbol{v}$ are orthogonal.} \end{align*}\]

Let \(V\) be the matrix whose columns are \(\boldsymbol{v}_1, \boldsymbol{v}_2, \dots, \boldsymbol{v}_N\) which are the distinct eigenvectors of \(A\). Since all the eigenvectors of \(A\) are orthogonal, then \(\langle \boldsymbol{v}_i,\boldsymbol{v}_j \rangle=\delta_{ij}\) for all \(i,j=1,2,\dots,N\). This means that \(V\) must be an orthogonal matrix, i.e. \({V}^{\mathrm{T}}V=\mathcal{I}\). Moreover, since \(A=V\Lambda V^{-1}\) and \(V^{-1}={V}^{\mathrm{T}}\), then \(A=V \Lambda {V}^{\mathrm{T}}\). Therefore the diagonal matrix of eigenvalues \(\Lambda\) is equal to \(\Lambda={V}^{\mathrm{T}} A V\), more specifically \({\boldsymbol{v}_i}^{\mathrm{T}} A \boldsymbol{v}_j=\delta_{ij}\lambda_i\). !!!

Therefore, if a matrix \(A\) is real and symmetric, then the eigenvectors \(\boldsymbol{v}_1,\boldsymbol{v}_2,\dots,\boldsymbol{v}_N\) must satisfy \[{\boldsymbol{v}_i}^{\mathrm{T}} \boldsymbol{v}_j=\delta_{ij}=\begin{cases} 1 & i=j \\ 0 & i \neq j\end{cases} \quad \text{and} \quad {\boldsymbol{v}_i}^{\mathrm{T}} A \boldsymbol{v}_j=\delta_{ij}\lambda_i =\begin{cases} \lambda_i & i=j \\ 0 & i \neq j \end{cases}\quad \text{for all} \quad i,j=1,2,\dots,N.\] Since \(\boldsymbol{v}_1, \boldsymbol{v}_2, \dots \boldsymbol{v}_N\) is a set of \(N\) linearly independent vectors in \(\mathbb{R}^N\), then they must span \(\mathbb{R}^N\). Therefore, any vector \(\boldsymbol{x} \in \mathbb{R}^N\) with \(\| \boldsymbol{x} \|_2=1\) can be written as a linear combination of \(\boldsymbol{v}_1,\boldsymbol{v}_2,\dots,\boldsymbol{v}_N\), specifically \[\boldsymbol{x}=\sum_{j=1}^{N}{a_j \boldsymbol{v}_j} \quad \text{where} \quad a_j \in \mathbb{R}\quad \text{for all} \quad j=1,2,\dots,N.\] Therefore \[1=\| \boldsymbol{x} \|={\boldsymbol{x}}^{\mathrm{T}}\boldsymbol{x}=\sum_{j=1}^{N}{a_j^2} \quad \text{and} \quad {\boldsymbol{x}}^{\mathrm{T}}A\boldsymbol{x}=\sum_{j=1}^{N}{\lambda_j a_j^2}.\]

Fibonacci Sequence

There are many applications of the eigenvalue decomposition. A simple one involves the analysis of the Fibonacci numbers. Consider the sequence \(\left\{ F_n \right\}_{n \in \mathbb{N}}\) which satisfies \[ F_0 = 0 \quad ; \quad F_1 = 1 \quad ; \quad F_{n+1} = F_n + F_{n-1}\ , \quad n \geq 1.\] It is known that the ratio \(\frac{F_{n+1}}{F_{n}} \to \varphi=\frac{1+\sqrt{5}}{2}\) as \(n \to \infty\). To show that in a different way using eigenvalue decomposition, consider the vector \[\boldsymbol{u}_{n} = \begin{pmatrix} F_{n+1} \\ F_n \end{pmatrix}.\] This vector can form the recurrence relation \[\boldsymbol{u}_{n}=A\boldsymbol{u}_{n-1} \quad \text{where} \quad A=\begin{pmatrix} 1 & 1\\ 1 & 0 \end{pmatrix} \quad \text{and} \quad \boldsymbol{u}_0=\begin{pmatrix} 1\\0 \end{pmatrix}.\] The vector \(\boldsymbol{u}_n\) can then be written in terms of \(\boldsymbol{u}_0\) by repeated substitution: \[\boldsymbol{u}_n=A\boldsymbol{u}_{n-1}=A^2 \boldsymbol{u}_{n-2}=\dots=A^{n-2} \boldsymbol{n}_{2}=A^{n-1} \boldsymbol{n}_{1}=A^n \boldsymbol{u}_0.\] Therefore \(\boldsymbol{u}_n = A^n \boldsymbol{u}_0\) but doing this requires calculating the \({n}^{\mathrm{th}}\) power of the matrix \(A\) which may be difficult to do.

In order to circumvent calculating \(A^n\) explicitly, consider the eigenpairs of \(A\) which are (noting that the eigenvectors of \(A\) are orthogonal since \(A\) is symmetric) \[\left\{ \frac{1+\sqrt{5}}{2} \; ; \; \begin{pmatrix} 1+\sqrt{5} \\ 2 \end{pmatrix} \right\} \quad ; \quad \left\{ \frac{1-\sqrt{5}}{2} \; ; \; \begin{pmatrix} 1-\sqrt{5} \\ 2 \end{pmatrix} \right\}.\] For the sake of convenience, define \(\lambda_{\pm}=\frac{1 \pm \sqrt{5}}{2}\), then the eigenpairs can be rewritten as \[\left\{ \lambda_+ \; ; \; \begin{pmatrix} 2\lambda_+ \\ 2 \end{pmatrix} \right\} \quad ; \quad \left\{ \lambda_- \; ; \; \begin{pmatrix} 2\lambda_- \\ 2 \end{pmatrix} \right\}.\] Since any multiple of an eigenvector is still an eignevctrors, then both eigenvectors can be divided by 2 to give \[\left\{ \lambda_+ \; ; \; \begin{pmatrix} \lambda_+ \\ 1 \end{pmatrix} \right\} \quad ; \quad \left\{ \lambda_- \; ; \; \begin{pmatrix} \lambda_- \\ 1 \end{pmatrix} \right\}.\]

Let \(\Lambda\) be the diagonal matrix whose entries are the eigenvalues of \(A\) and let \(V\) be the matrix whose columns are the eigenvectors, i.e. \[\Lambda=\begin{pmatrix} \lambda_+ & 0 \\ 0 & \lambda_- \end{pmatrix} \quad \text{and} \quad V=\begin{pmatrix} \lambda_+ & \lambda_- \\ 1 & 1 \end{pmatrix}.\] Since the eigenvectors are distinct, then \(A\) is diagonalisable and can be written as \(A=V \Lambda V^{-1}.\)

This can be verified as follows: \[\begin{align*} V \Lambda V^{-1} & = \begin{pmatrix} \lambda_+ & \lambda_- \\ 1 & 1 \end{pmatrix} \begin{pmatrix} \lambda_+ & 0 \\ 0 & \lambda_- \end{pmatrix} \begin{pmatrix} \lambda_+ & \lambda_- \\ 1 & 1 \end{pmatrix}^{-1} \\ & = \frac{1}{\lambda_+ - \lambda_-} \begin{pmatrix} \lambda_+ & \lambda_- \\ 1 & 1 \end{pmatrix} \begin{pmatrix} \lambda_+ & 0 \\ 0 & \lambda_- \end{pmatrix} \begin{pmatrix} 1 & -\lambda_- \\ -1 & \lambda_+ \end{pmatrix} \\ & = \frac{1}{\lambda_+ - \lambda_-} \begin{pmatrix} \lambda_+^2 & \lambda_-^2 \\ \lambda_+ & \lambda_- \end{pmatrix} \begin{pmatrix} 1 & -\lambda_- \\ -1 & \lambda_+ \end{pmatrix} \\ & = \frac{1}{\lambda_+ - \lambda_-} \begin{pmatrix} \lambda_+^2 - \lambda_-^2 & -\lambda_+^2 \lambda_-+\lambda_-^2 \lambda_+ \\ \lambda_+ - \lambda_- & -\lambda_+ \lambda_- + \lambda_- \lambda_+ \end{pmatrix} \\ & = \frac{1}{\lambda_+ - \lambda_-} \begin{pmatrix} \left( \lambda_+ + \lambda_- \right)\left( \lambda_+ - \lambda_- \right) & \lambda_+ \lambda_- \left( \lambda_- - \lambda_+ \right) \\ \lambda_+ - \lambda_- & 0 \end{pmatrix} \\ & = \begin{pmatrix} \lambda_+ + \lambda_- & - \lambda_+ \lambda_- \\ 1 & 0 \end{pmatrix} \\ & = \begin{pmatrix} \frac{1+\sqrt{5}}{2}+\frac{1-\sqrt{5}}{2} & -\left( \frac{1+\sqrt{5}}{2} \right)\left( \frac{1-\sqrt{5}}{2} \right) \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}=A. \end{align*}\]

Now consider matrix \(A^2\): \[A^2=AA=\left( V \Lambda V^{-1} \right)\left( V \Lambda V^{-1} \right)=V \Lambda^2 V^{-1}.\] Since \(\Lambda\) is a diagonal matrix, then \(\Lambda^2\) is also a diagonal matrix whose terms are the squares of \(\Lambda\), i.e. \[\Lambda^2=\begin{pmatrix} \lambda_+ & 0 \\ 0 & \lambda_- \end{pmatrix}^2=\begin{pmatrix} \lambda_+^2 & 0 \\ 0 & \lambda_-^2 \end{pmatrix}.\] Similarly, the higher powers can be done in the same way, therefore \[A^n=V\begin{pmatrix} \lambda_+^n & 0 \\ 0 & \lambda_-^n \end{pmatrix}V^{-1} \quad \text{for all} \quad n \geq 1.\] This shows a way in which the matrix powers can be calculated easily. Returning to \(\boldsymbol{u}_n=A^n \boldsymbol{u}_0\): \[\boldsymbol{u}_n=A^n \boldsymbol{u}_0=V \begin{pmatrix} \lambda_+^n & 0 \\ 0 & \lambda_-^n \end{pmatrix} V^{-1} \begin{pmatrix} 1 \\ 0 \end{pmatrix}= \begin{pmatrix} \lambda_+ & \lambda_- \\ 1 & 1 \end{pmatrix} \begin{pmatrix} \lambda_+^n & 0 \\ 0 & \lambda_-^n \end{pmatrix} \begin{pmatrix} \lambda_+ & \lambda_- \\ 1 & 1 \end{pmatrix}^{-1} \begin{pmatrix} 1 \\ 0 \end{pmatrix} \] \[ \implies \quad\boldsymbol{u}_n=\begin{pmatrix} \frac{\lambda_+^{n+1} - \lambda_-^{n+1}}{\lambda_+ - \lambda_-} \\ \frac{\lambda_+^{n} - \lambda_-^{n}}{\lambda_+ - \lambda_-} \end{pmatrix}.\] Therefore \[\frac{F_{n+1}}{F_n}= \frac{ \frac{\lambda_+^{n+1} - \lambda_-^{n+1}}{\lambda_+ - \lambda_-} }{ \frac{\lambda_+^{n} - \lambda_-^{n}}{\lambda_+ - \lambda_-} } = \frac{ \lambda_+^{n+1} - \lambda_-^{n+1} }{ \lambda_+^{n} - \lambda_-^{n}}.\] Since \(0< | \lambda_- | < 1\), then \(\lambda_-^n\) tends to 0 as \(n\) tends to infinity. Therefore, passing the limit as \(n\) tends to infinity gives \[\lim_{n \to \infty} \frac{F_{n+1}}{F_n} = \lim_{n \to \infty} \frac{ \lambda_+^{n+1} - \lambda_-^{n+1} }{ \lambda_+^{n} - \lambda_-^{n}} = \lim_{n \to \infty} \frac{ \lambda_+^{n+1} }{ \lambda_+^{n} } = \lim_{n \to \infty} \lambda_+ = \lambda_+=\frac{1+\sqrt{5}}{2}\] which is indeed the Golden Ratio.

In performing this procedure, there is one important caveat. The matrix \(V\) must be inverted which is simple in the \(2 \times 2\) case but can be computationally expensive for much larger sizes. This, again, can be circumvented by ensuring that \(V\) is an orthogonal matrix. Recall that since \(A\) is Hermitian, all its eigenvectors, and hence all the columns of \(V\), must be orthogonal. In order to make \(V\) an orthogonal matrix, the 2-norm of each of its columns must be equal to 1, this can be done by dividing each column by its norm (which is feasible since any multiple of an eigenvector is still an eigenvector). To normalise the vectors, divide them by their 2-norm: \[\left\|\begin{pmatrix} 1 \pm \sqrt{5} \\ 2 \end{pmatrix}\right\|_2^2=\left( 1 \pm \sqrt{5} \right)^2+2^2=10 \pm 2\sqrt{5}\] Therefore, after normalisation, eigenpairs will be \[\left\{ \lambda_+;c_+\begin{pmatrix} \lambda_+ \\ 1 \end{pmatrix} \right\} \quad ; \quad \left\{ \lambda_-;c_-\begin{pmatrix} \lambda_- \\ 1 \end{pmatrix} \right\} \quad \text{where} \quad c_{\pm}=\frac{2}{\sqrt{10 \pm 2\sqrt{5}}}.\] This means that the eigenvalue decomposition of \(A\) is \(A=\tilde{V}\Lambda \tilde{V}^{-1}\) where \[\tilde{V}=\begin{pmatrix} c_+\lambda_+ & c_-\lambda_- \\ c_+ & c_- \end{pmatrix} \quad \text{and} \quad \Lambda=\begin{pmatrix} \lambda_+ & 0 \\ 0 & \lambda_- \end{pmatrix}.\] The most important fact about the matrix \(\tilde{V}\) is that it is an orthogonal matrix (meaning all its columns are orthonormal). Therefore \[\tilde{V}^{-1} = {\tilde{V}}^{\mathrm{T}} = \begin{pmatrix} c_+ \lambda_+ & c_+ \\ c_- \lambda_- & c_- \end{pmatrix}.\] The normalisation procedure is computationally cheap and so is matrix transposition, much more so than matrix inversion. The same matrix power can be used as before: \[\boldsymbol{u}_n=A^n \boldsymbol{u}_0=\tilde{V} \begin{pmatrix} \lambda_+^n & 0 \\ 0 & \lambda_-^n \end{pmatrix} \tilde{V}^{-1} \boldsymbol{u}_0 = \begin{pmatrix} c_+\lambda_+ & c_-\lambda_- \\ c_+ & c_- \end{pmatrix} \begin{pmatrix} \lambda_+^n & 0 \\ 0 & \lambda_-^n \end{pmatrix} \begin{pmatrix} c_+\lambda_+ & c_+ \\ c_-\lambda_- & c_- \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix}\] \[\implies \quad\boldsymbol{u}_n =\sqrt{5}\begin{pmatrix} c_+ c_- \lambda_+^{n+1}-c_+ c_- \lambda_-^{n+1} \\ c_+ c_- \lambda_+^n -c_+ c_- \lambda_-^n \end{pmatrix}=c_+ c_- \sqrt{5}\begin{pmatrix} \lambda_+^{n+1}-\lambda_-^{n+1} \\ \lambda_+^n-\lambda_-^n \end{pmatrix}.\] Therefore \[\frac{F_{n+1}}{F_n}=\frac{c_+ c_- \sqrt{5} \left( \lambda_+^{n+1}-\lambda_-^{n+1} \right)}{c_+ c_- \sqrt{5} \left( \lambda_+^{n}-\lambda_-^{n} \right)}=\frac{\lambda_+^{n+1}-\lambda_-^{n+1}}{\lambda_+^{n}-\lambda_-^{n}}\] which is the same result as before. The eigenvalue decomposition is useful in this case but for larger matrices, normalisation needs to be done on the eigenvectors in order to avoid inverting matrices.

I.1 Calculating Eigenvalues Using the Power Method

For a diagonalisable matrix \(A\), the Power Method is a process used to calculate the smallest and largest eigenvalues (in absolute value) as well as their associated eigenvectors.

Let \(A \in \mathbb{R}^{N\times N}\) be a real diagonalisable matrix, then \(A=V\Lambda V^{-1}\) where \(\Lambda \in \mathbb{R}^{N \times N}\) is the diagonal matrix whose terms are the eigenvalues of \(A\) and \(V \in \mathbb{R}^{N \times N}\) is the matrix whose columns are the eigenvectors of \(A\) corresponding to \(\Lambda\).

For now, suppose, suppose that \(A\) is real and symmetric, then all the eigenvalues are real and all the eigenvectors are orthogonal, furthermore, the eigenvectors can be chosen to be orthonormal in order to make \(V\) an orthogonal matrix. Suppose that the eigenvalues of \(A\) are ordered in such a way that \[ |\lambda_1| \geq |\lambda_2| \geq \dots \geq |\lambda_N|, \tag{I.1}\] in this case, the largest eigenvalue in magnitude is called the Dominant Eigenvalue, which is \(\lambda_1\) in this case.

The Power Method can be summarised as follows: Start with an arbitrary unit vector \(\boldsymbol{q}^{(0)}\) that has a non-zero component in the direction of \(\boldsymbol{v}_1\) (i.e. \(\| \boldsymbol{q}^{(0)} \|_2=1\) and \({\left( \boldsymbol{q}^{(0)} \right)}^{\mathrm{T}}\boldsymbol{v}_1 \neq 0\)), then starting from \(k=1\):

  • Calculate \(\boldsymbol{z}^{(k)}=A\boldsymbol{q}^{(k-1)}\);
  • Update \(\boldsymbol{q}^{(k)}=\frac{\boldsymbol{z}^{(k)}}{\| \boldsymbol{z}^{(k)} \|_2}\) (meaning that \(\boldsymbol{q}^{(k)}\) is still a unit vector);
  • Update \(\alpha^{(k)}={(\boldsymbol{q}^{(k)})}^{\mathrm{T}}A\boldsymbol{q}^{(k)}\);
  • Update \(k \to k+1\) and repeat until \[\left| \alpha^{(k)}-\alpha^{(k-1)} \right| \leq \tau \left| \alpha^{(k)} \right|\] where \(\tau\) is the desired tolerance.

The final value of \(\alpha\) will be the eigenvalue of \(A\) which has the largest magnitude. This result can be stated more formally as follows:

Theorem I.1 (Power Method) Let \(A\in\mathbb{R}^{N\times N}\) be symmetric with the eigenvalues \(\lambda_1,\lambda_2,\dots,\lambda_N\) and their corresponding eigenvectors \(\boldsymbol{v}_1, \boldsymbol{v}_2, \dots, \boldsymbol{v}_N\) such that \[|\lambda_1| \geq |\lambda_2| \geq \dots \geq |\lambda_N|.\] Consider a unit vector \(\boldsymbol{q}^{(0)}\) such that \({(\boldsymbol{q}^{(0)})}^{\mathrm{T}} \boldsymbol{v}_1 \neq 0\) (i.e. \(\boldsymbol{q}^{(0)}\) has a component in the direction of \(\boldsymbol{v}_1\)). Then the sequence of vectors \[\boldsymbol{q}^{(k)}=\frac{A \boldsymbol{q}^{(k-1)}}{\| A \boldsymbol{q}^{(k-1)} \|_2}\] converges to \(\boldsymbol{v}_1\) and \[\alpha^{(k)}={(\boldsymbol{q}^{(k)})}^{\mathrm{T}}A\boldsymbol{q}^{(k)}\] converges to \(\lambda_1\) as \(k\) tends to \(\infty\).

Proof. Since \(A\) is real and symmetric, then the eigenvectors \(\boldsymbol{v}_1, \boldsymbol{v}_2, \dots, \boldsymbol{v}_N\) can be chosen in such a way that they form an orthonormal basis of \(\mathbb{R}^N\), therefore the unit vector \(\boldsymbol{q}^{(0)}\) can be written as a linear combination of \(\boldsymbol{v}_1, \boldsymbol{v}_2, \dots, \boldsymbol{v}_N\) as \[\boldsymbol{q}^{(0)}=\frac{1}{\gamma^{(0)}} \sum_{i=1}^{N}{\beta_i \boldsymbol{v}_i} \quad \text{where} \quad \gamma^{(0)}=\sqrt{\sum_{i=1}^{N}{\beta_i^2}}.\] (The division by \(\gamma^{(0)}\) is to ensure that the vector \(\boldsymbol{q}^{(0)}\) is a unit vector.)

It can be proven, by induction (as detailed in Appendix \(\ref{App:Ind}\)), that \[\boldsymbol{q}^{(k)}=\frac{A \boldsymbol{q}^{(k-1)}}{\| A \boldsymbol{q}^{(k-1)} \|_2}=\frac{1}{\gamma^{(k)}}\sum_{i=1}^{N}{\beta_i \lambda_i^k \boldsymbol{v}_{i}} \quad \text{where} \quad \gamma^{(k)}=\sqrt{\sum_{i=1}^{N}{\beta_i^2 \lambda_i^{2k}}}.\] This can be rewritten by isolating the first term in the sum as \[\boldsymbol{q}^{(k)}=\frac{\beta_1 \lambda_1^k}{\gamma^{(k)}}\left( \boldsymbol{v}_1+\sum_{i=2}^{N}{\frac{\beta_i}{\beta_1} \frac{\lambda_i^k}{\lambda_1^k} \boldsymbol{v}_i} \right) \quad \text{with} \quad \gamma^{(k)}=\sqrt{\beta_1^2 \lambda_1^{2k} \left( 1+\sum_{i=2}^{N}{\frac{\beta_i^2}{\beta_1^2} \frac{\lambda_i^{2k}}{\lambda_1^{2k}}} \right)}.\]

Since, by the way the eigenvalues have been arranged, it was assumed that \(\lambda_1\) is the largest eigenvalue in absolute value, then \(\left| \frac{\lambda_i}{\lambda_1} \right|<1\) for all \(i\), and therefore \(\left| \frac{\lambda_i^k}{\lambda_1^k} \right|\) tends to 0 as \(k\) tends to \(\infty\). Meaning that as \(k\) tends to \(\infty\), then \(\gamma^{(k)} \to \beta_1 \lambda_1^{k}\) and hence \(\boldsymbol{q}^{(k)} \to \boldsymbol{v}_1\). Now consider the expression for \(\alpha^{(k)}={(\boldsymbol{q}^{(k)})}^{\mathrm{T}}A\boldsymbol{q}^{(k)}\), passing the limit as \(k\) tends to \(\infty\) gives \[\lim_{k \to \infty}\alpha^{(k)}=\lim_{k \to \infty}\left[ {(\boldsymbol{q}^{(k)})}^{\mathrm{T}}A\boldsymbol{q}^{(k)} \right] = {\boldsymbol{v}_1}^{\mathrm{T}}A\boldsymbol{v}_1 =\lambda_1\] since \(A\) is diagonalisable and \(\| \boldsymbol{v}_1 \|_2=1\).

The power method can be generalised in several ways:

  • Inverse Power Method: A possible generalization involves applying the method to the inverse of the matrix \(A\) (provided \(A\) is non-singular). Since the eigenvalues of \(A^{-1}\) are the reciprocals of those of \(A\), the power method in that case gives an approximation to the eigenvalue of \(A\) of minimum modulus. This is called the Inverse Power Method which can be formally stated as follows: Given an initial unit vector \(\boldsymbol{x}^{(0)}\), let \(\boldsymbol{y}^{(0)} = \frac{\boldsymbol{x}^{(0)}}{\|\boldsymbol{x}^{(0)}\|_2}\). Then, for \(k \geq 1\), compute \[\boldsymbol{x}^{(k)} =A^{-1}\boldsymbol{y}^{(k-1)} \quad ; \quad \boldsymbol{y}^{(k)} =\frac{\boldsymbol{x}^{(k)}}{\|\boldsymbol{x}^{(k)}\|_2} \quad ; \quad \mu^{(k)}= {(\boldsymbol{y}^{(k)})}^{\mathrm{T}}A^{-1} \boldsymbol{y}^{(k)}.\] If \(A\) has \(N\) linearly independent eigenvectors and the minimum eigenvalue is distinct from all the others, then \[\lim_{k\rightarrow\infty}\mu^{(k)} = \frac{1}{\lambda_N}\] if the eigenvalues are arranged by size as before. This means that \((\mu^{(k)})^{-1}\) tends to \(\lambda_N\) as \(k\) tends to \(\infty\). Effectively, at every step \(k\), a linear system of the form \(A\boldsymbol{x}^{(k)} =\boldsymbol{y}^{(k-1)}\) needs to be solved. It is therefore convenient to find the LU decomposition of \(A\) then solving the system since this would require solving two triangular systems at each iteration.

  • Power Method with Shift: Another generalization of the power method involves approximating the (unknown) eigenvalue of \(A\) nearest to a given number \(\sigma\) (either real or complex). Let \(\lambda_\sigma\) denote such eigenvalue and define the shifted matrix \(A_\sigma = A-\sigma \mathcal{I}\) whose eigenvalues are \(\lambda(A_\sigma) = \lambda(A)-\sigma\). In order to approximate \(\sigma\), we can first approximate the eigenvalue of minimum length of \(A_\sigma\), say \(\lambda_n(A_\sigma)\), by applying the inverse power method to \(A-\sigma \mathcal{I}\), and then compute \(\lambda_\sigma = \lambda_n(A_\sigma) + \sigma\). This technique is known as the Power Method with Shift and the number \(\sigma\) is called the Shift. Obviously, the inverse power method (without shift) is recovered by simply setting \(\sigma = 0\).

  • QR Method: All the eigenvalues of \(A\) can be calculate at once by using the QR Method which is based on the QR decomposition of \(A\). Initialise the iteration with \(A^{(0)}=A\), then for \(k \geq 1\), calculate the QR decomposition of \(A^{(k-1)}\) as \(A^{(k-1)}=Q^{(k-1)}R^{(k-1)}\) and the next iteration of \(A\) will be \(A^{(k)}=R^{(k-1)}Q^{(k-1)}\). It can be proven that \(A^{(k)}\) converges to an upper triangular matrix as \(k\) tends to \(\infty\). Also \[A^{(k)}= Q^TA^{(k-1)} Q = Q^{-1}A^{(k-1)} Q\] which means that, for all \(k\), \(A^{(k-1)}\) has same eigenvalues as \(A^{(0)} = A\), meaning that the diagonal entries of \(A^{(k)}\) get closer and closer to the required eigenvalues of \(A\) as \(k\) tends to \(\infty\).