# Linear Algebra Notes

Notes on some linear algebra topics. Sometimes I find things are just stated
as being obvious, for example, the dot product of two orthogonal vectors
is zero

. Well... why is this? The result was a pretty simple but
nonetheless it took a little while to think it through properly.
Hence, these notes... they're my musings on the "why", even if the
"why" might be obvious to the rest of the world!

## Page Contents

## Matricies And Simultaneous Linear Equations

The set of linear equations:
$$ a_{11}x_1 + a_{12}x_2 + ... + a_{1n}x_n = b_1\\
a_{21}x_1 + a_{22}x_2 + ... + a_{2n}x_n = b_2\\
... \\
a_{m1}x_1 + a_{m2}x_2 + ... + a_{mn}x_n = b_m
$$
Can be represented in matrix form:
$$
\begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} & \cdots & a_{mn} \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_n \\
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
b_2 \\
\vdots \\
b_m \\
\end{bmatrix}
$$
Any vector $\vec x$, where $A\vec x = \vec b$, is a solution to the set of simultaneous
linear equations. How can we solve this from the matrix. We create an **augmented or patitioned matrix**:
$$
\left[
\begin{array}{cccc|c}
1a_{11} & a_{12} & \cdots & a_{1n} & b_1\\
a_{21} & a_{22} & \cdots & a_{2n} & b_2 \\
\vdots & \vdots & \ddots & \vdots & \vdots \\
a_{m1} & a_{m2} & \cdots & a_{mn} & b_m\\
\end{array}
\right]
$$
Next this matrix must be converted to **row echelon form**. This is where each row has
the same or less zeros to the right than the row below. For example, the following matrix is
in row echelon form:
$$
\begin{bmatrix}
1 & 2 & 3 & 4 \\
0 & 1 & 2 & 3 \\
0 & 0 & 0 & 1 \\
0 & 0 & 0 & 2 \\
\end{bmatrix}
$$
But, this next one is not:
\begin{bmatrix}
1 & 2 & 3 & 4 \\
0 & 0 & 0 & 1 \\
0 & 0 & 0 & 2 \\
0 & 1 & 2 & 3 \\
\end{bmatrix}
We go from a matrix to the equivalent in row-echelon form using **elementary row operations**:

- Row interchange: $R_i \leftrightarrow R_j$
- Row scaling: $R_i \rightarrow kR_i$
- Row addition: $R_i \rightarrow R_i + kR_j$

We can do this in python using the `sympy`

package and the `rref()`

function:

from sympy import Matrix, init_printing init_printing(use_unicode=True) system = Matrix(( (3, 6, -2, 11), (1, 2, 1, 32), (1, -1, 1, 1) )) system system.rref()[0]

Which outputs:

⎡1 0 0 -17/3⎤ ⎢ ⎥ ⎢0 1 0 31/3 ⎥ ⎢ ⎥ ⎣0 0 1 17 ⎦

To then apply this to our augmented matrix we could do something like the following. Lets say we have a set of 3 simultaneous equations with 3 unknowns: $$ 3x + 6y - 2z = 11 \\ x + 2y + z = 32 || x - y + z = 1 $$ We'd make an augmented matrix like so: $$ \left[ \begin{array}{ccc|c} 3 & 6 & -2 & 11\\ 1 & 2 & 1 & 32\\ 1 & -1 & 1 & 1 \end{array} \right] $$ To solve this using Python we do:

from sympy import Matrix, solve_linear_system, init_printing from sympy.abc import x, y, z init_printing(use_unicode=True) system = Matrix(( (3, 6, -2, 11), (1, 2, 1, 32), (1, -1, 1, 1) )) system solve_linear_system(system, x, y, z)

Which will give the following output:

⎡3 6 -2 11⎤ ⎢ ⎥ ⎢1 2 1 32⎥ ⎢ ⎥ ⎣1 -1 1 1 ⎦ {x: -17/3, y: 31/3, z: 17}

## Orthogonal Vectors and Linear Independence

*Figure 1*

**Orthogonal vectors** are vectors that are perpendicular to each other.
In a standard 2D or 3D graph, this means that
they are at right angles to each other and we can visualise them
as seen in Figure 1: $\vec x$ is at 90 degrees to both $\vec y$ and $\vec z$.
$\vec y$ is at 90 degrees to $\vec x$ and $\vec z$, and $\vec z$ is at
90 degrees to $\vec y$ and $\vec z$.

In any set of any vectors, $[\vec{a_1}, ..., \vec{a_n}]$, the vectors are
said to be linearly **linearly independent** if every vector in the set
is orthogonal to every other vector in the set.

So, how do we tell if one vector is orthogonal to another? The answer is
the **dot product** which is defined as follows.
$$x \cdot y = \sum_1^n {x_i y_i}$$

We know when two **vectors are orthogonal when their dot product is
zero: $x \cdot y = 0 \implies$ x and y are orthogonal**.

But why is this the case? Lets imagine any two aribtrary vectors, each on the circumference of a unit circle (so we know that they have the same length and are therefore a proper rotation of a vector around the centre of the circle). This is shown in figure 2. From the figure, we know the following:

Figure 2

$$
\begin{align}
x_1 &= r \cos{\theta_1}& y_1 &= r \sin{\theta_1} \\
x_2 &= r \cos({\theta_1 + \theta_n}) & y_2 &= r \sin({\theta_1 + \theta_n})
\end{align}
$$
The vector $x_2$ is the vector $x_1$ rotated by $\theta_n$ degrees.
We can use the following
trig identities:
$$
\begin{align}
\sin(a \pm b)& = \sin(a)\cos(b) \pm \cos(a)\sin(b) \\
\cos(a \pm b)& = \cos(a)\cos(b) \mp \sin(a)\sin(b)
\end{align}
$$
Substitute these into the formauls above, we get the following.
$$
\begin{align}
x_2 &= r\cos\theta_1\cos\theta_n - r\sin\theta_1\sin\theta_n \\
y_2 &= r\sin\theta_1\cos\theta_n + r\cos\theta_1\sin\theta_n \\
\end{align}
$$
Which means that...
$$
\begin{align}
x_2 &= x_1\cos\theta_n - y_1\sin\theta_n \\
y_2 &= y_1\cos\theta_n + x_1\sin\theta_n
\end{align}
$$
For a 90 degree rotation $\theta_n = 90^\circ$ we know that $\cos\theta_n = 0$
and $\sin\theta_n = 1$. Substiuting these values into the above equations
we can clearly see that...
$$
\begin{align}
x_2 &= -y_1 \\
y_2 &= x_1
\end{align}
$$
Therefore, any vector in 2D space that is $[a, b]$ will become
$[-b, a]$ and therefore the dot product becomes $-ab + ab = 0$. And
voilà, we know know **why** the dot product of two orthogonal 2D vectors is
zero. I'm happy to take it on faith that this extrapolates into n dimensions :)

Another way of looking at this is to consider the law of cosines: $C^2 = A^2 + B^2 - 2AB\cos(c)$:

Here $A$ and $V$ represent the magnitude of the vectors $\vec x$, $\|\vec x\|$, and $\vec y$, $\|\vec y\|$. $C$ represents the magnitude of the vector $\|\vec x - \vec y\|$. Thus, we have this:

I.e., $\|\vec x - \vec y\|^2 = \|\vec x\|^2 + \|\vec y\|^2 - 2\|\vec x\|\|\vec y\|\cos(\theta_c)$.

When the angle between the vectors is 90 degrees, $\cos(90) = 0$ and so that part of the equation dissapears to give us $\|\vec x - \vec y\|^2 = \|\vec x\|^2 + \|\vec y\|^2$. Now... $$ \|\vec x - \vec y\|^2 = \|\vec x\|^2 + \|\vec y\|^2 \\ \therefore \|\vec x\|^2 + \|\vec y\|^2 - \|\vec x - \vec y\|^2 = 0 $$ Recall that $\|\vec v\| = \sqrt{v_1^2 + v_2^2 + ... + v_n^2}$ and that therefore $\|\vec v\|^2 = v_1^2 + v_2^2 + ... + v_n^2$ .

Based, on this we can say that... $$ \begin{align} \|\vec x - \vec y\|^2 &= (x_1 - y_1)^2 + (x_2 - y_2)^2 + ... + (x_n - y_n)^2 \\ &= x_1^2 - 2x_1y_1 + y_1^2 + x_2^2 - 2x_2y_2 + y_2^2 + ... + x_n^2 - 2x_ny_n + y_n^2 \end{align} $$ Now, all of the $x_i^2$ and $y_i^2$ terms above will be cancelled out by the $\|\vec x\|^2$ and $\|\vec y\|^2$ terms, leaving us with... $$ \|\vec x\|^2 + \|\vec y\|^2 - \|\vec x - \vec y\|^2 = 2x_1y_1 + 2x_2y_2 + ... + 2x_ny_n = 0 $$ Which is just twice the dot product when the vectors are at 90 degrees - or just the dot product because we can divide through by 2.

## Orthogonal Matrices

So, onto orthogonal matrices. A **matrix is orthogonal if $AA^T = A^TA = I$**.
If we take a general matrix and multiply it by it's transpose we get...

The pattern $ab + cd$ looks pretty familiar right?! It looks a lot like
$x_1x_2 + y_1y_2$, our formula for the dot product of two vectors. So,
when we say $a=x_1$, $b=x_2$, $c=y_1$ and $d=y_2$, we will get the following:
a matrix of two *row vectors* $\vec v_1 = [x_1, y_1]$, $\vec v_2 = [x_2, y_2]$.
$$
A =
\begin{pmatrix}
\vec v_1 \\
\vec v_2 \\
\end{pmatrix} =
\begin{pmatrix}
x_1 & y_1 \\
x_2 & y_2 \\
\end{pmatrix}
$$
$$
\therefore AA^T =
\begin{pmatrix}
x_1 & y_1 \\
x_2 & y_2 \\
\end{pmatrix}
\begin{pmatrix}
x_1 & x_2 \\
y_1 & y_2 \\
\end{pmatrix}
=
\begin{pmatrix}
x_1^2 + y_1^2 & x_1x_2 + y_1y_2 \\
x_1x_2 + y_1y_2 & x_2^2 + y_2^2 \\
\end{pmatrix}
$$
If our vectors $\vec v_1$ and $\vec v_2$ are orthogonal,
then the component $x_1x_2 + y_1y_2$ must be zero. This would give us the first
part of the identify matrix pattern we're looking for.

The other part of the identity matrix would imply that we would have to have $x_1^2 + y_1^2 = 1$ and $x_2^2 + y_2^2 = 1$... which are just the formulas for the square of the length of a vector. Therefore, if our two vectors are normal (i.e, have a length of 1), we have our identity matrix.

Does the same hold true for $A^TA$? It doesn't if we use our original matrix A!... $$ A^TA = \begin{pmatrix} x_1 & x_2 \\ y_1 & y_2 \\ \end{pmatrix} \begin{pmatrix} x_1 & y_1 \\ x_2 & y_2 \\ \end{pmatrix} = \begin{pmatrix} x_1^2 + x_2^2 & x_1y_1 + x_2y_2 \\ x_1y_1 + x_2y_2 & y_1^2 + y_2^2 \\ \end{pmatrix} $$ Oops, we can see that we didn't get the identity matrix!! But, perhaps we can see why. If $A$ was a matrix of row vectors then $A^T$ is a matrix of column vectors. So for $AA^T$ we were multiplying a matrix of row vectors with a matrix of column vectors, which would, in part, give us the dot products as we saw. So if we want to do $A^TA$ it would follow that for this to work $A$ now was to be a matrix of column vectors because we get back to our original ($A^T$ would become a mtrix of row vectors): $$ A^TA = \begin{pmatrix} x_1 & y_1 \\ x_2 & y_2 \\ \end{pmatrix} \begin{pmatrix} x_1 & x_2 \\ y_1 & y_2 \\ \end{pmatrix} = \begin{pmatrix} x_1^2 + y_1^2 & x_1x_2 + y_1y_2 \\ x_1x_2 + y_1y_2 & x_2^2 + y_2^2 \\ \end{pmatrix} $$

So we can say that if we have **matrix who's rows are orthogonal vectors
and who's columns are also orthogonal vectors, then we have
an orthogonal matrix**!

Okay, thats great n' all, but **why should we care? Why Are orthogonal
matricies useful?**. It turns our that orthogonal matricies
preserve
angles and lengths of vectors. This can be useful in graphics to rotate
vectors but keep the shape they construct, or in numerical analysis
because they do not amplify errors.

## Determinants

This is one of the better YouTube tutorials I've found that explains the *concepts* and not just the
mechanics behind determinants, byt 3BlueBrown (awesome!).

## Eigenvectors and Eigenvalues

Stackoverflow thread: What is the importance of eigenvalues and eigenvectors.

Eigenvectors and values exist in pairs: every eigenvector has a corresponding eigenvalue. An eigenvector is a direction, ... An eigenvalue is a number, telling you how much variance there is in the data in that direction ...

## PCA

Really liked these YouTube tutorials by Data4Bio:

- Dimensionality Reduction: High Dimensional Data, Part 1
- Dimensionality Reduction: High Dimensional Data, Part 2
- Dimensionality Reduction: High Dimensional Data, Part 3
- Dimensionality Reduction: Principal Components Analysis, Part 1
- Dimensionality Reduction: Principal Components Analysis, Part 2
- Dimensionality Reduction: Principal Components Analysis, Part 3