You could have invented matrices!
Let’s talk about vectors. A vector (denoted as , a lower-case bold italicized letter) is an element in a vector space, which means that it can be “scaled”, like (the is called a “scalar” — creative name, right?) and added, like .
In order for vector spaces and their operations to be valid, they just have to obey some common-sense rules (like associativity, commutativity, distributivity, etc.) that allow us to make meaningful conclusions.
One neat thing about vector spaces is that, in some of them, you have the ability to “decompose” any vector in it as a weighted sum of some set of basis vectors. If this is the case for your vector space, then the size of smallest possible set of basis vectors is known as the dimension of that vector space.
For example, for a 3-dimensional vector space , any vector can be described as a weighted sum of three basis vectors. If we call them , , , then:
Where , , and are scalars of .
Dimensionality is really a statement about being able to decompose any vector in that vector space into a combination of a set of basis vectors. For a 3-dimensional vector space, you can make a bases that can reproduce any vector in your space…but that’s only possible with at least three vectors. a 4-dimensional vector space, you’d need at least four vectors.
For example, in physics, we often treat reality as taking place in a three-dimensional vector space. The basis vectors are often called , , and , and so we say that we can describe our 3D physics vectors as .
One neat thing that physicists take advantage of all the time is that if we agree on a set of basis vectors and a specific ordering, we can actually encode any vector in terms of those basis vectors.
In physics, for instance, we can say “Let’s encode vectors in terms of sums of scalings of , , and , in that order.” Then, we can write as , and understand that we really mean .
It should be made clear that is not the same thing as the vector . It is an encoding of that vector, which only makes sense once we choose to agree on a specific set of basis. If we chose a different basis, we’d have a different encoding.
In the general case, for an N-dimensional vector space, this means that, with a minimum of N items, we can represent any vector in that space. And, if we agree on those N items, we can devise an encoding, such that:
will encode the vector:
Note that what this encoding represents is completely dependent on what we pick, and in what order. The basis vectors we pick are arbitrary, and determine what our encoding looks like.
To highlight this, note that the same vector has many different potential encodings — all you have to do is pick a different set of basis vectors, or even just re-arrange or re-scale the ones you already have. However, all of those encodings correspond go the same vector .
One interesting consequence of this is that any N-dimensional vector space whose scalars are in is actually isomorphic to (the vector space of N-tuples of real numbers). This means that we can basically treat any N-dimensional vector space with scalars as if it was , once we decide on the basis vectors. Because of this, we often call all N-dimensional vector spaces (whose scalars are in ) . You will often hear physicists saying that the three-dimensional vector spaces they use are . However, what they really mean is that their vector spaces is isomorphic to .
Now, one of the most interesting things in mathematics is the idea of the linear transformation. Linear transformations are useful to study because:
- They are ubiquitous. They come up everywhere in engineering, physics, mathematics, data science, economics, and pretty much any mathematical theory. And there are even more situations which can be approximated by linear transformations.
- They are mathematically very nice to work with and study, in practice.
A linear transformation, , is a function that “respects” addition and scaling:
This means that if you scale the input, the output is scaled by the same amount. And also, if you transform the sum of two things, it’s the same as the sum of the transformed things (it “distributes”).
Note that I snuck in vector notation, because the concept of vectors are perfectly suited for studying linear transformations. That’s because talking about linear transformations requires talking about scaling and adding, and…hey, that’s just exactly what vectors have!
From now on, we’ll talk about linear transformations specifically on N-dimensional vector spaces (vector spaces that have dimensions and bases we can use).
Studying linear transformations
From first glance, a linear transformation’s description doesn’t look too useful or analyzable. All you have is . It could be anything! Right? Just a black box function?
But, actually, we can exploit its linearity and the fact that we’re in a vector space with a basis to analyze the heck out of any linear transformation, and see that all of them actually have to follow some specific pattern.
Let’s say that is a linear transformation from N-dimensional vector space to M-dimensional vector space . That is, .
Because we know that, once we pick a set of basis vectors , any vector in can be decomposed as , we really can just look at how a transformation acts on this decomposition. For example, if is three-dimensional:
Hm. Doesn’t seem very insightful, does it?
A simple definition
But! We can exploit the linearity of (that it distributes and scales) to rewrite that as:
Okay, take a moment to pause and take that all in. This is actually a pretty big deal! This just means that, to study , all you need to study is how acts on our basis vectors. If you know how acts on our basis vectors of our vector space, that’s really “all there is” about ! Not such a black box anymore!
That is, if I were to ask you, “Hey, what is like?”, all you’d have to tell me is the result of , , and . Just give me those three vectors, and we uniquely determine .
To put in another way, any linear transformation from a three-dimensional vector space is uniquely characterized and determined by three vectors: , , and . Those three vectors completely define .
In general, we see that any linear transformation from an N-dimensional vector space can be completely defined by N vectors: the N results of that transformation on each of N basis vectors we choose.
Enter the Matrix
Okay, so how do we “give”/define/state those N vectors?
Well, recall that the result of and , etc. are themselves vectors, in M-dimensional vector space . Let’s say that is 2-dimensional, for now.
This means that any vector in can be represented as , where and is an arbitrary choice of basis vectors.
This means that etc. can also all be represented in terms of these basis vectors. So, laying it all out:
Or, to use our bracket notation from before:
So, we now see two facts:
- A linear transformation from an N dimensional vector space to an M dimensional vector space can be defined using N vectors.
- Each of those N vectors can, themselves, be defined using M scalars each.
Our final conclusion: any linear transformation from an N dimensional vector space to an M dimensional vector space can be completely defined using scalars.
That’s right – all possible linear transformations from a 3-dimensional vector space to a 2-dimensional are parameterized by only six scalars! These six scalars uniquely determine and define our linear transformation, given a set of basis vectors that we agree on. All linear transformations can be defined/encoded/expressed with just six real numbers.
These six numbers are pretty important. Just like how we often talk about 3-dimensional vectors in terms of the encoding of their three coefficients, we often talk about linear transformations from 3-d space to 2-d space in terms of their six defining coefficients.
We group these things up in something called a matrix.
If our linear transformation from a 3-dimensional vector space to a 2-dimensional vector space is defined by:
(for arbitrary choice of bases and )
We “encode” it as the matrix:
And that’s why we use matrices in linear algebra – like how is a convenient way to represent and define a vector (once we agree on a bases), a matrix is a convenient way to represent and define a linear transformation from an N-dimensional vector space to a M-dimensional vector space (once we agree on the bases in both spaces).
Let’s look at the vector space of polynomials, which includes vectors like , etc.; scaling a polynomial just means scaling the coefficients, and adding together polynomials is just normal polynomial addition.
It’s an infinite-dimensional vector space, and one popular basis for this vector space is the polynomials , etc.
With this choice of basis, we can encode a polynomial like with the notation .
One popular linear transformation on polynomials is the derivative, . It takes and returns . In the basis we just mentioned, it takes and returns .
Now, if you don’t know what a derivative was, or how to compute it – you’re in luck! What we just found out was that if you want to completely understand a linear transformation, you just need to know how it works on the basis! You just need to know what etc. are, and you basically know everything about what the derivative does.
All I need to do is tell you that (what it does to each basis – the trusty power rule), and now you know everything you need to know about derivatives on polynomials. You can basically just skip all of calculus!
Let’s look at what this linear transformation does to each basis:
And so, in that basis, the derivative of a polynomial can be represented as the matrix:
No calculus required! (This is the core idea of the formal derivative of a polynomial)
In this light, we can understand the definition of the common matrix operations.
Matrix-vector application (or “multiplication”) is essentially the decoding of the linear transformation that the matrix represents.
Let’s look at the example. Recall that we had:
And we say that is completely defined by:
This means that:
Which is itself a vector in , so let’s write this as a combination of its components and , by distributing and rearranging terms:
And this is exactly the formula for matrix-vector multiplication!
Again, remember that what we are doing is manipulating specific encodings of our vectors and our linear transformations. Namely, we encode linear transformations as matrices, and vectors in their component encoding. The reason we can do these is that we agree upon a set of bases for our source and target vector spaces, and express these encodings in terms of those.
The magic we get out of this is that we can manipulate things in our “encoding world”, which correspond to things in the “real world”.
Addition of linear transformations
One neat thing about linear transformation is that they “add” well – you can add them together by simply applying them both and adding the results. The result is another linear transformation.
If and are linear transformations between the same vector spaces, then , as we defined it, is also one:
(Showing that it respects addition is something you can look at if you want to have some fun!)
So, if is encoded as matrix for given bases, and is encoded as matrix , what is the encoding of ?
Let’s say that, if and are 3-dimensional and 2-dimensional, respectively:
Then the breakdown of is:
Note that if we say that is encoded as matrix , and call the components , , etc., then we can rewrite that as:
Where , , etc.
So, if and encode linear transformations and , then we can encode as matrix , where the components of are just the sum of their corresponding components in and .
And that’s why we define , matrix-matrix addition, as component-wise addition: component-wise addition perfectly “simulates” the addition of the linear transformation!
What’s happening here is we can represent manipulations of the functions themselves by manipulating their encodings.
And, again, the magic here is that, by manipulating things in our “encoding world”, we can make meaningful manipulations in the “real world” of linear transformations.
Symbolically, if we write function application as matrix-vector multiplication, we say that is defined so that .
Multiplication of linear transformations
We might be tempted to define multiplication of linear transformations the same way. However, this doesn’t quite make sense.
Remember that we talked about adding linear transformations as the addition of their results. However, we can’t talk about multiplying linear transformations as the multiplication of their results because the idea of a vector space doesn’t come with any notion of multiplication.
However, even if we talk specifically about linear transformations to scalars, this still doesn’t quite work:
So, . Therefore, , defined point-wise, does not yield a linear transformation.
Therefore, there is no matrix that could would even represent or encode , as we defined it. So, since isn’t even representable as a matrix in our encoding scheme, it doesn’t make sense to treat it as a matrix operation. There’s no possible result!
Composition of linear transformations
Since linear transformations are functions, we can compose them:
Is the composition of linear transformations also a linear transformation?
Yes! (Well, once you prove that it respects addition. I’ll leave the fun to you!)
Okay, so we know that is indeed a linear transformation. That means that it can also be encoded as a matrix.
So, let’s say that , then . is a linear transformation from to , and is a linear transformation from to . That means that is a linear transformation from to . It goes from , through , and all the way to .
Let’s say that is 3-dimensional, is 2-dimensional, and is 4-dimensional.
If is encoded by the matrix , and is encoded by matrix , then we can represent as the matrix .
If you’ve taken a linear algebra class, you might recognize this pattern. Combining a and a to make a ?
We can compute using only the encodings and ! We call this matrix multiplication. It’s typically denoted as .
That’s exactly what matrix multiplication is defined as. If:
- is a matrix representing a linear transformation from a M-dimensional space to an O-dimensional space
- is an matrix representing a linear transformation from an N-dimensional space to an M-dimensional space
- is a matrix representing a linear transformation from an N-dimensional space to an O-dimensional space.
Again – manipulation of our encodings can manifest the manipulation in the linear transformations that we want.
Symbolically, if we treat function application as matrix-vector multiplication, this means that is defined such that .
In that notation, it kinda looks like the associativity of multiplication, doesn’t it? Don’t be fooled! , matrix-matrix multiplication, is a completely different type of operation than . One is the symbolic manipulation on two encodings of of a linear transformation, and the other is an application of an encoding of a linear transformation on encoding of a vector.
If you’re familiar with Haskell idioms, matrix-matrix multiplication is like
. (function composition), and matrix-vector multiplication is like
$, or function application. One is a “higher order function”: taking two functions (at least, the encodings of them) and returning a new function. The other is an application of a function to its input.
And, like in Haskell:
We won’t go over the actual process of computing the matrix-matrix product, but it’s something that you can work out just in terms of the definitions of the encodings. Just manually apply out everything and group together common factors of the basis vectors of the destination space.
The Big Picture
At the highest level here, what we’re doing is taking a function and encoding it as data – a parameterization of that function. Essentially, we take the properties of the type of functions we are looking at and find out that it can be defined/represented in a limited number of parameters
Then, the breakthrough is that we look at useful higher-order functions and manipulations of those transformations. Then, we see how we can implement those transformations by symbolically manipulating the encodings!
This is actually a dance we do all the time in programming. Instead of working with functions, we work with reified data that represent those functions. And, instead of direct higher order functions, we transform that data in a way that makes it encodes the function we want to produce.
Matrices are exactly that. Linear transformations are the functions we want to analyze, and we realize that we can completely specify/define any linear transformation with a matrix (against a choice of bases).
Then, we realize that there are some nice manipulations we can do on linear transformations; we can combine them to create new linear transformations in useful ways.
However, because those manipulations all produce new linear transformations, we know that their results can all be encoded in new matrices. So, we see if we can just directly apply those manipulations by directly working on those matrices!
I hope this post serves to demystify matrices, matrix addition, and multiplication for you, and help you see why they are defined the way that they are. Furthermore, I hope that it gives some insight on why matrices are useful in linear algebra, and also how similar encodings can help you with manipulating other types of functions!