# Linear Algebra Notes

Notes on some linear algebra topics. Sometimes I find things are just stated
as being obvious, for example, the dot product of two orthogonal vectors
is zero

. Well... why is this? The result was a pretty simple but
nonetheless it took a little while to think it through properly.
Hence, these notes... they're my musings on the "why", even if the
"why" might be obvious to the rest of the world! It also degraded
into just notes too, without the musings :'(...

A lot of these notes are now just based on the amazing Khan Academy linear algebra tutorials.

## Page Contents

## Vector Stuff

### Vector Spaces

References:

- What isn't a vector space?, Mathematics Stack Exchange.
- Things that are/aren't vector spaces, Dept. of Mathematics and Statistics,University of New Mexico.
- Vector Spaces, physics.miami.edu.
- Vector space, Wikipedia.

A vector is an element of a *vector space*. A *vector space*, also known as a
*linear space*, , is a set which has addition and scalar multiplication defined
for it and satisfies the following for any vectors , , and and scalars
and :

1 | Closed under addition: | |

2 | Communative: | |

3 | Associative: | |

4 | Additive identity: | |

5 | Inverse: | |

6 | Closed under scalar multiply: | |

7 | Distributive: | |

8 | Distributive: | |

9 | Distributive: | |

10 | Multiply identity: |

#### Functions Spaces

First let's properly define a function. If and are sets and we have and ,
we can form a set that consists of *ordered* pairs . If it is the case that
, then is a function and
we write . This is called being "single valued", i.e., each input to the
function is unambiguous. Note, is not a function, it is the value *returned* by
the function: is the function that defines the mapping from elements in (the *domain*)
to the elements in (the *range*).

One example of a vector space is the set of real-valued functions of a real variable, defined on the domain . I.e., we're talking about where each is a function. In other words, with the aforementioned restriction on the domain.

We've said that a vector is any element in a vector space, but usually we thing of this in terms of, say, . We can also however, given the above, think of functions as vectors: the function-as-a-vector being the "vector" of ordered pairs that make up the mapping between domain and range. So in this case the set of function is still like a set of vectors and with scalar multiplication and vector addition defined appropriately the rules for a vector space still hold!

#### What Isn't A Vector Space

It sounded a lot like everything was a vector space to me! So I did a quick bit of googling and found the paper "Things that are/aren't vector spaces". It does a really good example of showing how in maths we might build up a model of something and expect it to behave "normally" (i.e., obeys the vector space axioms), but in fact doesn't, in which case it becomes a lot harder to reason about them. This is why vectors spaces are nice... we can reason about them using logic and operations we are very familiar with and feel intuitive.

### Linear Combinations

A linear combination of vectors is one that only uses the addition and scalar multiplication of those variables. So, given two vectors, and the following are linear combinations: This can be generalise to say that for any set of vectors that a linear combination of those vectors is .

### Spans

The *set of all linear combinations* of a set of vectors is called the *span* of those vectors:
In the graph to the right there are two vectors and . The dotted blue lines show
all the possible scalings of the vectors. The vectors span those lines. The green lines show
two particular scalings of added to all the possible scalings of . One can
use this to imagine that if we added all the possible scalings of to all the possible
scalings of we would reach every coordinate in . Thus, we can say that
the set of vectors spans , or .

From this we may be tempted to say that any set of two, 2d, vectors spans but we
would be wrong. Take for example and . These two variables are
co-linear. Because
they span the same line, no combination of them can ever move off that line. Therefore they
do *not* span .

Lets write the above out a little more thoroughly... The span of these vectors is, by our above definition: Which we can re-write as: ... which we can clearly see is just the set of all scaled vectors of . Plainly, any co-linear set of 2d vectors will not span .

### Linear Independence

We saw above that the set of vectors are co-linear and thus did
not span . The reason for this is that they are linealy *dependent*, which
means that *one or more vectors in the set are just linear combinations of other vectors
in the set*.

For example, the set of vectors spans , but is redundant because we can remove it from the set of vectors and still have a set that spans . In other words, it adds no new information or directionality to our set.

A set of vectors will be said to be linearly *dependent* when:
Thus a set of vectors is linearly **independent** when the following is met:
Sure, this is a definition, but why does this mean that the set is linearly *dependent*?

Think of vectors in a . For any two vectors that are not co-linear, there is no combination of those two vectors that can get you back to the origin unless they are both scaled by zero. The image to the left is trying to demonstrate that. If we scale , no matter how small we make that scaling, we cannot find a scaling of that when added to will get us back to the origin. Thus we cannot find a linear combination where and/or does not equal zero.

This shows that neither vector is a linear combination of the other, because if it where, we could use a non-zero scaling of one, added to the other, to get us back to the origin.

We can also show that two vectors are linearly independent when their dot product is zero. This is covered in a latter section.

Another way of thinking about LI is that there is only one solution to the equation .

### Linear Subspace

If is a set of vectors in , then is a *subspace* of
if and only if:

**contains the zero vector**:**is closed under scalar multiplication**:**is closed under addition**:

What these rules mean is that, using linear operations, you can't do anything to "break" out of the subspace using vectors from that subspace. Hence "closed".

Surprising, at least for me, was that the zero vector is actually a subspace of because the set obeys all of the above rules.

A very useful thing is that a *span is always a valid subspace*. Take, for example,
. We can show that it is a valid subspace by checking that it
objects the above 3 rules.

First, does it contain the zero vector? Recall that a span is the set of all linear combinations
of the span vectors. So we know that the following linear combination is valid, and hence the
span contains the zero vector.
Second, is it closed under scalar multiplication? Because the span is the set of *all*
linear combinations, for any set of scalars ...
... must also be in the subspace, by definition of a subspace!

Third, is it closed under vector addition? Well, yes, the same argument as made above would apply.

Interestingly, using the same method as above, we can see that even something trivial, like the , which is just a line, is a subspace of .

It can also be shown that *any subspace basis will have the same number of elements*. We say that the **dimension of a subspace** is the number of
elements in a basis for the subspace. So, for example, if we had a
basis for our subspace consisting of 13 vectors, the dimension of the subspce would be lucky 13.

### Basis Of A Subspace

Recall, is a subspace of (where each vector ). We also know that a span is the set of all possible linear combinations of these vectors.

If we choose a subset of , , where all the vectors in are linearly *independent*,
an we can write any vector in as a linear combination of vectors in ,
then we say that is a *basis* for .

acts as a "minimal" set of vectors required to span . I.e., if you added any other vector to it is superfluous, because it would be a linear combination of the vectors already in . You don't add any new information.

Put more formally: Let be a subspace of . A subset is a basis for iff the following are true:

- The vectors span , and
- They are linearly
*in*dependent.

A subspace can have an infinite number of basis, but there are *standard basis*. For
example in the standard basis is . The rule of **invariance of dimension** states that two bases for a subspace contain the same number of vectors. The number of elements in the basis for a subspace is called the subspace's **dimension**, denoted as , where is the subspace.

Thus, we can say that every subspace of has a basis and .

A basis is a good thing because we can break down any element in the space that the basis defines into a linear combination of basic building blocks, i.e. the basis vectors.

To **find a basis** use the following method:

- Form a matrix where the j
^{th}column vector is the j^{th}vector . - Reduce the matrix to row-echelon form.
- Tak the column vectors that contain a pivot and these are the basis vectors.

TODO discuss the canonical vectors in R2 and R3. , ... but we could also have , .

TODO unique coefficients. implied by the linear independence

TODO most important basis are othogonal and orthonormal.

Orthonormal are good because we can find the coefficient in the new basis easily:

### The Dot Product

This is a good reference.

The dot product is one definition for a vector multiplication that results in a scalar. Given two equal-length vectors, the dot product looks like this: Written more succinctly we have: The dot product is important because it will keep popping up in many places. Some particular uses are to calculate the length of a vector and to show that two vectors are linearly independent. The operation has the following properties:

Commutative: | |

Distributive: | |

Associative | |

Cauchy-Schwartz Inequality | , where is non-zero. |

Vector Triangle Inequality |

The dot product also defines *the angle between vectors*. Take two unit vectors and .
We can calculate:
We know that:
And we know that:
Therefore, by substitution:
Thus, ** describes the angle between unit vectors**. If the vectors are not unit
vectors then the angle is defined, in general, by:

### Unit Length Vectors

A unit vector is a vector that has a length of 1. If is a vector (notice the special symbol used) then we defined its length as: Thus if a vector is a unit vector then .

Interesting we can also write .

To construct a unit vector where , we need the sum to also be 1! How do we acheive this if ? We do the following:

### Orthogonal Vectors and Linear Independence

*Figure 1*

**Orthogonal vectors** are vectors that are perpendicular to each other.
In a standard 2D or 3D graph, this means that
they are at right angles to each other and we can visualise them
as seen in Figure 1: is at 90 degrees to both and .
is at 90 degrees to and , and is at
90 degrees to and .

In any set of any vectors, , the vectors are
said to be linearly **linearly independent** if every vector in the set
is orthogonal to every other vector in the set.

So, how do we tell if one vector is orthogonal to another? The answer is
the **dot product** which is defined as follows.

We know when two **vectors are orthogonal when their dot product is
zero: x and y are orthogonal**.

But why is this the case? Lets imagine any two aribtrary vectors, each on the circumference of a unit circle (so we know that they have the same length and are therefore a proper rotation of a vector around the centre of the circle). This is shown in figure 2. From the figure, we know the following:

Figure 2

The vector is the vector rotated by degrees.
We can use the following
trig identities:
Substitute these into the formauls above, we get the following.
Which means that...
For a 90 degree rotation we know that
and . Substiuting these values into the above equations
we can clearly see that...
Therefore, any vector in 2D space that is will become
and therefore the dot product becomes . And
voilÃ , we know know **why** the dot product of two orthogonal 2D vectors is
zero. I'm happy to take it on faith that this extrapolates into n dimensions :)

Another way of looking at this is to consider the law of cosines: :

Here and represent the magnitude of the vectors , , and , . represents the magnitude of the vector . Thus, we have this:

I.e., .

When the angle between the vectors is 90 degrees, and so that part of the equation dissapears to give us . Now... Recall that and that therefore .

Based, on this we can say that... Now, all of the and terms above will be cancelled out by the and terms, leaving us with... Which is just twice the dot product when the vectors are at 90 degrees - or just the dot product because we can divide through by 2.

## Matrices

### Properties

Khan Academy has great articles. This is another good reference from MIT.

#### Scalar Multiplication

- Associative:
- Distributive: and
- Identity:
- Zero:
- Closure: is a matrix of the same dimensions as .

Note that scalar multiplication is *not* commutative: !

#### Matrix Addition

- Commutative:
- Associative:
- Zero: For any matrix , there is a unique matrix such that
- Inverse:
- Closure: , is a matrix of the same dimensions as and .

#### Matrix Multiplication

- Associative:
- Distributive:
- Identity:
- Zero:
- Inverse: Only if is a
*square*matrix, then is its inverse and . Note that*not all matrices are invertible*. - Dimension: If is an matrix and is an matrix then is an matrix.

Note that matrix multiplication is *not* commutative: !

Be careful: the equation does not necessarily mean that or and for a general matrix , , unless is invertible.

#### Matrix Transpose

- Associative:
- Distributive (into addition):
- Distributive (into mult):
- Identity:
- Inverse:

#### Matrix Inverse

- Associative (self with inverse):
- Distributive (into scalar mult):
- Distributive (into mult):
- Identity:
- Name?:
- Name?:

### Matricies, Simultaneous Linear Equations And Reduced Row Echelon Form

References:

- Lecture 18: The Rank of a Matrix and Consistency of Linear Systems, Ohio University.
- Types of Solution Sets.
- Row Echelon Form and Number of Solutions.

The set of linear equations:
Can be represented in matrix form:
Any vector , where , is a solution to the set of simultaneous
linear equations. How can we solve this from the matrix. We create an **augmented or patitioned matrix**:
Next this matrix must be converted to **row echelon form**. This is where each row has
the same or less zeros to the left than the row below. For example, the following matrix is
in row echelon form:
But, this next one is not:
We go from a matrix to the equivalent in row-echelon form using **elementary row operations**:

- Row interchange:
- Row scaling:
- Row addition:

In row echelon form:

- All rows containing only zeros appear below rows with nonzero entries, and
- The first nonzero entry in any row appears in a column to the right of the first nonzero entry in any preceding row.

The first non zero entry in a row is called the *pivot* for that row.

And woop, woop, even more important than echelon form is **reduced row-echelon form (rref)**. This is when:

- If a row has all zeros then it is below all rows with a non-zero entry.
- The left most entry in each row is 1 - it is called the
*pivot*. - The left most non-zero entry in each row is the only non-zero in that column.
- Each row has more zeros to the left of the pivot than the row above.

Or, to put it another way, when it is a matrix in row-echelon form where:

- All pivots are equal to 1, and
- Each pivot is the only non-zero in its column.

Some important terms are *pivot position* and *pivot column*. The position is the location of a leading 1, which meets the above criteria, and the column is a column that contains a pivot position.

So why do we give a damn above RREF if we have row-echelon form? Well, one reason is it lets us
figure out if a set of linear equations has a solution and which variables are *free* and
which are *dependent*. The reason for this is that the RREF of every matrix is *unique* (whereas merely row reducing a matrix can have many resulting forms).

The *free* variable is one that you have control over, i.e. you can choose and manipulate its value.
The *dependent* one depends on the values of the other variables in the experiment or is constant.

*Pivot columns represent variables that are dependent*. Why?
Recall that the column vector containing a pivot is all-zeros except
for the pivot. So if element *j* in a row of a rref matrix is a pivot, then in our
system of equations only
appears in this one equations in our RREF matrix. Thus, its value must be determined as this is
the only place it occurs. It must either be a constant or a combination of other variables.

Because column vectors containing a pivot are all-zeros except
for the pivot, the **set of pivot columns must also be LI** (see later).
You can always represent the free columns as linear combinations of the
pivot columns.

If we have an augmented matrix :

*No Solution*. If the last (augmented) column of (ie. the column which contains ) contains a pivot entry.*Restructed Solution Set*. If the last row of the matrix is all zeros except the pivot column which is a function of .*Exactly One Solution*. If the last (augmented) column of (ie. the column which contains ) does not contain a pivot entry and all of the columns of the coefficient part (ie. the columns of ) do contain a pivot entry.*Infinitely Many Solutions*. If the last (augmented) column of (ie. the column which contains ) does not contain a pivot entry and at least one of the columns of the coefficient part (ie. the columns of ) also does not contain a pivot entry.

Let's look at a quick example of a matrix in echelon-form:

We can get the solution to the set of simultaneous linear equations using back subsitution. Directly from the above row-echelon form (useful because the last row has one pivot and no free variables) matrix we know that: From (1) we know: Substitute (4) back into (2): Substitute (5) and (4) back into (1):

We can use RREF to accomplish the same thing and find the solutions to the above. Which means we come to the same conclusion as we did when doing back subsitution, but were able to do it using elementary row operations.

We can not talk about the **general solution vector**, , which in this case is:
We can also talk about the **solution set**, which is the set of all vectors that result in a valid solution to our system of equations. In this case our solution set is trivial: it just has the above vector in it because the RREF had pivots in every column so there is exactly one solution.

Lets, therefore, look at something more interesting. Take the RREF matrix below: Now we have two free variables! Now and where and can be any values we like. Now the general solution vector looks like this: How did we get here? We we know from the RREF matrix that: We replace and with our free variables and then solve for the pivots, thus getting the general solution vector.

The above can then be represented using the **general solution vector**:
The **solution set** thus becomes:

We can find the reduced row echelon form of a matrix in python using the `sympy`

package and the `rref()`

function:

from sympy import Matrix, init_printing init_printing(use_unicode=True) system = Matrix(( (3, 6, -2, 11), (1, 2, 1, 32), (1, -1, 1, 1) )) system system.rref()[0]

Which outputs:

âŽ¡1 0 0 -17/3âŽ¤ âŽ¢ âŽ¥ âŽ¢0 1 0 31/3 âŽ¥ âŽ¢ âŽ¥ âŽ£0 0 1 17 âŽ¦

To then apply this to our augmented matrix we could do something like the following. Lets say we have a set of 3 simultaneous equations with 3 unknowns: We'd make an augmented matrix like so: To solve this using Python we do:

from sympy import Matrix, solve_linear_system, init_printing from sympy.abc import x, y, z init_printing(use_unicode=True) system = Matrix(( (3, 6, -2, 11), (1, 2, 1, 32), (1, -1, 1, 1) )) system solve_linear_system(system, x, y, z)

Which will give the following output:

âŽ¡3 6 -2 11âŽ¤ âŽ¢ âŽ¥ âŽ¢1 2 1 32âŽ¥ âŽ¢ âŽ¥ âŽ£1 -1 1 1 âŽ¦ {x: -17/3, y: 31/3, z: 17}

### Matrix / Vector Multiplication

Matrix/vector multiplication is pretty straight forward:

But having said that, I'd never really thought of it as shown below until watching some stuff on Khan Academy:

We can think of it as the linear combination of the column vectors of the matrix where the coefficients are the members of the vector, where the first column vector is multiplied by the first element, the second column vector by the second element and so on.

So, for any mxn matrix, , and vector we can say that,

### Null Space Of A Matrix

For an mxn vector , the nullspace of the nxm matrix , is defined as follows:
In words, the *null space of a matrix is the set of all solutions
that result in the zero vector*, i.e. the solution set to the above. We can get this set by using the normal reduction to RREF to solve the above (). The solution space set to this is the null space.

The system of linear equations is called **homogeneous**. It is always
**consistent**, i.e., has a solution, because is always a solution (and is called the **trivial** solution).

*The null space of a matrix is a valid subspace of * because
it satisfies the following conditions, as we saw in the
section on subspaces.

**contains the zero vector**:**is closed under scalar multiplication**:**is closed under addition**:

It can also be shown that , where is the reduced row echelon form.

The nullspace of a matrix can be related to the linear independence of it's column
vectors. Using the above definition of nullspace, we can write:
Recalling how we can write out vector/scalar multipication,
we get,
Recall from the section on linear independence that the above set of vectors
is said to be linearly *dependent* when:
Given that our nullspace is the set of all vectors that satisfy the above, *by definition*,
then we can say that **the column vectors of a matrix A are LI if and only if **.

To *find* the null space of a matrix , augment it with the zero vector and convert it to RREF. Then the solution set is the null space. So,
if there are only pivots, the null space only contains the zero vector because there is only one solution (which therefore must be 0).

TODO: https://math.stackexchange.com/questions/21131/physical-meaning-of-the-null-space-of-a-matrix

### Nullity Of A Matrix

We saw in the subspace section that the dimension of a subspace is the number of elements in a basis for that subspace. The dimension of
the null space is called the **"nullity"**:

In the null space you will always have *as many vectors as you have free variables in the RREF of the matrix*. So the *nullity is the number of free variables in the RREF of the matrix with which this null space is associated*:

### Column Space Of A Matrix

The columns space of a matrix is the set of all linear combinations
of its column vectors, which is the span of the column vectors:
Note that the column vectors of might not be LI. The set can be
reduced so that only the LI vectors are used, which means the above still holds,
and then we say that the LI set is a *basis for *.
The column space of A is a valid subspace.

For the system of equations , is , therefore, the set of all values that can be.

Therefore, if and , then we know that no solution exists, a conversely, if it is then we know that at least one solution exists.

The set of vectors in the column space of may or may not be linearly independent. When reduced to being LI then we have a *basis for the
column space of the matrix*.

We saw the following in the section of null spaces: It can be shown that the column vectors of a matrix are LI if and only if . So we can figure out if we have a basis by looking at 's null space. Recall that and that the column vectors of a matrix A are LI if and only if !

How do we figure out the set of column vectors that are LI? To do this,
put the matrix in RREF and select the columns that have pivot entries (the pivot columns).
*The pivot column vectors are the basis vectors for * :) The reason is that you can always write the columns with the free variables
as linear combinations of the pivot columns.

Because column vectors containing a pivot are all-zeros except
for the pivot, the **set of pivot columns must also be LI**.
You can always represent the free columns as linear combinations of the
pivot columns.

So to find column space of , pick the columns from that correspond to (have the same position as) the pivot columns in .

A really interested property of the column space is the **column space criterion** which states that a linear system, has a solution iff is in the column space of A.

### Rank Of A Matrix

We can now define what the rank of a matrix is: The rank is the number of vectors in a basis for the column space of . You can get this by counting the pivot columns in .

The **rank equation** is , where is an matrix.

If is reduced to row-echelon form , then:

- is the number of free variables in solution space of , which equals the number of pivot-free columns in .
- is the number of pivots in .
- , equals the number of columns of .

TODO: https://math.stackexchange.com/questions/21100/importance-of-matrix-rank

### Basis Of The Column Space VS Basis Of The Null Space

Really good answer by John-Paul Meyer on the Khan Academy website:

... a "basis of a column space" is different than a "basis of the null space", for the same matrix....

A basis is a a set of vectors related to a particular mathematical "space" (specifically, to what is known as a vector space). A basis must:

- be linearly independent and
- span the space.
The "space" could be, for example: , , , ... , the column space of a matrix, the null space of a matrix, a subspace of any of these, and so on...

### Orthogonal Matrices

So, onto orthogonal matrices. A **matrix is orthogonal if **.
If we take a general matrix and multiply it by it's transpose we get...

The pattern looks pretty familiar right?! It looks a lot like
, our formula for the dot product of two vectors. So,
when we say , , and , we will get the following:
a matrix of two *row vectors* , .
If our vectors and are orthogonal,
then the component must be zero. This would give us the first
part of the identify matrix pattern we're looking for.

The other part of the identity matrix would imply that we would have to have and ... which are just the formulas for the square of the length of a vector. Therefore, if our two vectors are normal (i.e, have a length of 1), we have our identity matrix.

Does the same hold true for ? It doesn't if we use our original matrix A!... Oops, we can see that we didn't get the identity matrix!! But, perhaps we can see why. If was a matrix of row vectors then is a matrix of column vectors. So for we were multiplying a matrix of row vectors with a matrix of column vectors, which would, in part, give us the dot products as we saw. So if we want to do it would follow that for this to work now was to be a matrix of column vectors because we get back to our original ( would become a mtrix of row vectors):

So we can say that if we have **matrix who's rows are orthogonal vectors
and who's columns are also orthogonal vectors, then we have
an orthogonal matrix**!

Okay, thats great n' all, but **why should we care? Why Are orthogonal
matricies useful?**. It turns our that orthogonal matricies
preserve
angles and lengths of vectors. This can be useful in graphics to rotate
vectors but keep the shape they construct, or in numerical analysis
because they do not amplify errors.

### Determinants

This is one of the better YouTube tutorials I've found that explains the *concepts* and not just the
mechanics behind determinants, byt 3BlueBrown (awesome!).

### Eigenvectors and Eigenvalues

Stackoverflow thread: What is the importance of eigenvalues and eigenvectors.

Eigenvectors and values exist in pairs: every eigenvector has a corresponding eigenvalue. An eigenvector is a direction, ... An eigenvalue is a number, telling you how much variance there is in the data in that direction ...

## Linear Transformations

### Functions

#### Scalar Functions

Normally a function is a *mapping* between two sets: from a *domain* to a *range*, which is
a subset of the function's *co-domain*. I.e., the co-domain is the set of all possible things the
function could map onto and the range is the subset of the co-domain containing the things the
function actually maps to. If we have two sets, and , we say that if is a function, then it associates alements of with elements of . I.e., is a *mapping* between and , and we write .

The usual function notation looks like this: . Note that is the function, not , which is the value output by the function, , when is input. There are other notations we use to denote functions.

For example the function can be described as a *mapping* from the
set of all real numbers to the set of all real numbers.
It can also be defined using this notation:

A function is not, however, a simple mapping between two sets. For a mapping to be a function the following must be true: . In other words one item in the domain cannot map to more than one item in the range, but two distinct items in the domain can map to the same item in the range.

#### Vector Valued Functions

Functions that map to are called scalar values functions, or specifically in this case, real valued functions.

If the co-domain of a function is in , where , then we say that the function
is *vector valued*. Functions, where the domain is in are called
*transformations*.

A vector values function could like something like this, for example:
It could look like anything... th epoint is because operates on a vector, we
call it a *transformation*.

#### Surjective (Onto) Functions

A surjective, or onto, function provides a mapping from the domain to
*every* element of the co-domain. I.e., the range *is* the
co-domain, or the image of the function is the co-domain. If and is surjective, then .
TODO: Insert diagram

#### Injective (One-To-One) Functions

An injective, or one-to-one, function provides a mapping to every element of the co-domain such that no two elements from the domain can map to the same value in the co-domain. Note that not all domain members need to be mapped however. TODO: Insert diagram

#### Bijective (One-To-One And Onto)

A function is bijective (one-to-one and onto) when it is both surjective (onto) and injective (one-to-one). TODO: Insert diagram

#### Invertible Functions

A function is invertible if there exists another function such that .

In the above is the identity function: where .

If a function is invertible then the function is *unique* and there is only *one unique
solution* to .

To summarise, the function is invertible iff, Which means that there is only one solution to , so we can also say: The part of the expression, implies that the range of the function is the co-domain, which means that the function is surjective (onto) and the part of the expression means that the function is also injective (onto), which thus means that the function is bijective (one-to-one and onto).

*Thus to be invertible a function must be bijective (one-to-one and onto)!!*

### Linear Transformations

Vector valued functions are *transformations*. Linear transformations are a particular
type of transformation that obeys thee following rules. If and
then:

In other words, the origin must remain fixed and all lines must remain lines.

Matrices are used to do the linear transformations so the function will generally
be defined as , where is the *transformation matrix*. 2D
vectors transform 2D space, 3D vectors transform 3D space and so on...

Interestingly *the column vectors of the transformation vector , tell us what
happens to the canonical basis vectors in that space*, and from this we can work out
how *any* vector would be transformed.

We have seen that in cartesian space, , for any coordinate, , the numbers and are the coefficients of the canonical basis vectors... Lets say our transformation matrix has the following properties: Well, we've seen we can re-write any vector as a combination of the above basis vectors, so we can write: Which looks suspiciously like a matrix, vector product. If we describe the transformation, as follows: Then... An so we can see that the first column vector of shows us where the first canonical basis vector maps to and the second column vector of shows us where the second canonical basis vector maps to... woop! In other words, just to be clear... We can also note that this the same as taking the transform of each of the column vectors of the identity matrix!

We can see this more "solidly" in the sketch below...

We can see how one coordinate can be represented using the canonical Cartesian coordinates, but also coordinates in our transformed space. Thus the same "point" can be represented using different coordinates (coefficients of the basis vectors) for different transform spaces.

Another important point is that it can be shown that *all matrix-vector products are linear transformations*!

Linear transformations also have these important properties, if we take two transformations and , where has the transformation matrix and has the transformation matrix :

- Transformation addition: .
- Scalar multiplication: .

### Inverse Linear Transformations

We saw in the section on functions that an invertible function has to be bijective (one-to-one and onto). As a transform is just a vector valued function, the same condition for invertibility exists. Recall that: Where the red inidcates that the range is the domain (mapping is surjective or onto) and the blue indicates that the mapping is unqiue (injective or onto-to-one). We can also write this as follows:

It can be shown that **invertibility implies a unique solution to **.

We want to understand under what conditions our standard linear transform, , is invertible. i.e. when is it both injective (ont-to-one) and surjective (onto)?

#### Surjective (Onto)

Start with *surjective (onto)*... we are looking at ,
i.e., ...

We can write as its column vectors multiplied by the elements of :
Remember that a surjective, or onto, function provides a mapping from the domain to *every*
element of the co-domain. Thus for our transform to be surjective (onto) *the set of the above
linear combinations must span *.

This means that for to be sujective (onto) *the column vectors of have to span
, i.e., the column space of must span * (look at the equation above to see why). I.e. . Thus, the
challenge becomes finding this out, because if it is true we are half way there to showing
that the transform is invertible.

For the column space to span (i.e. ) we would need the solution to to have infinitely many solutions. We saw in the section of RREF that when an augmented matrix is reduced to RREF we can determine the number of solutions by looking at where the pivots are.

To recap, when a matrix is in RREF:

- Free variables mean many solutions, , which is what we want!
- No free variables (all columns have a pivot) means a unique solution, .
- Pivot or function in the augmented column means no solution or restricted solution set respectively and .

Thus, to show that a transform is surjective (onto) we need the RREF of the transformation matrix to have at at least one solution... i.e., there is a unique solution or many solutions - the first two items on the list above.

Thus, there must be a pivot in every row of ou matrix , so for be onto we can say .

#### Injective (One-To-One)

For invertivility, onto is not enough. We need one-to-one as well! It can be shown that the
solution set to is a the set of shifted versions of the null space. I.e.,
assuming that has a solution, the solution set is

[Ref].

This means that for to be one-to-one we require the null space of contain only the zero vector. I.e., **for a transformation matrix to be one-to-one, its null space must be trivial**.

If the null space of is trivial then . Because the column vectors a LI, they form a basis of vectors so .

Thus, **the transform is one-to-one if and only if **.

#### Invertible: Injective (One-To-One) *and* Surjective (Onto)

We saw that to be surjective, , and to be injective, .
Therefore, **to be invertible, must be a square matrix**. As it's rank is , this also means that every column must contain a pivot, and because all the columns form a basis, i.e., are LI, if we reduce to RREF, it must be the elementary matrix, . Thus,

**must also be true**.

The inverse mapping is also linear. To recap, linearity implies: and

#### Finding The Inverse

To find the inverse of a matrix we augment it with the equivalently size elementry matrix and reduce it to RREF.

Let...
Augment with to form and reduce:
Thus,
And we see that it is indeed the inverse because:
And:
As we've seen, if isn't square then don't bother - it doesn't have an inverse. If it is square
and doesn't reduce to RREF, then it is **singular** and does not have an inverse.

### Projections

The graph to the left shows a vector and its *projection* onto the line , defined by the vector .
The projection of onto describes *how much of goes in the direction of *.

We write "the projection of onto like so: The projection is found by dropping a line, perpendicular to from the top of . Where this line hits is the end of the vector that describes the projection. This is shown on the graph by the smaller green vector labelled .

Because is perpendicular to we can say the following (recall perpendicular vector's dot product is zero): Thus the projection is defined as: And if is a unit vector , then this simplifies to: And we have seen in a previous section how to make an vector a unit vector. Whoopdi do!

Turns out that **projection is a linear transform**! To calculate the matrix that describes the projection transform,
apply the projection to each of the column vectors of the identity matrix where the column vectors of are the same
length as the vector over which the projection is applied. So, for example, vectors in to get the transformation
matrix for a projection onto a line defined by the vector :

## PCA

Really liked these YouTube tutorials by Data4Bio:

- Dimensionality Reduction: High Dimensional Data, Part 1
- Dimensionality Reduction: High Dimensional Data, Part 2
- Dimensionality Reduction: High Dimensional Data, Part 3
- Dimensionality Reduction: Principal Components Analysis, Part 1
- Dimensionality Reduction: Principal Components Analysis, Part 2
- Dimensionality Reduction: Principal Components Analysis, Part 3