You Could Have Invented Matrices!
by Justin Le ♦
Posted inYou could have invented matrices!
Let’s talk about vectors. A vector (denoted as
In order for vector spaces and their operations to be valid, they just have to obey some common-sense rules (like associativity, commutativity, distributivity, etc.) that allow us to make meaningful conclusions.
Dimensionalitytop
One neat thing about vector spaces is that, in some of them, you have the ability to “decompose” any vector in it as a weighted sum of some set of basis vectors. If this is the case for your vector space, then the size of smallest possible set of basis vectors is known as the dimension of that vector space.
For example, for a 3-dimensional vector space
Where
Dimensionality is really a statement about being able to decompose any vector in that vector space into a combination of a set of basis vectors. For a 3-dimensional vector space, you can make a bases that can reproduce any vector in your space…but that’s only possible with at least three vectors. a 4-dimensional vector space, you’d need at least four vectors.
For example, in physics, we often treat reality as taking place in a three-dimensional vector space. The basis vectors are often called
Encodingtop
One neat thing that physicists take advantage of all the time is that if we agree on a set of basis vectors and a specific ordering, we can actually encode any vector
In physics, for instance, we can say “Let’s encode vectors in terms of sums of scalings of
It should be made clear that
In the general case, for an N-dimensional vector space, this means that, with a minimum of N items, we can represent any vector in that space. And, if we agree on those N items, we can devise an encoding, such that:
will encode the vector:
Note that what this encoding represents is completely dependent on what
To highlight this, note that the same vector
One interesting consequence of this is that any N-dimensional vector space whose scalars are in
Linear Transformationstop
Now, one of the most interesting things in mathematics is the idea of the linear transformation. Linear transformations are useful to study because:
- They are ubiquitous. They come up everywhere in engineering, physics, mathematics, data science, economics, and pretty much any mathematical theory. And there are even more situations which can be approximated by linear transformations.
- They are mathematically very nice to work with and study, in practice.
A linear transformation,
This means that if you scale the input, the output is scaled by the same amount. And also, if you transform the sum of two things, it’s the same as the sum of the transformed things (it “distributes”).
Note that I snuck in vector notation, because the concept of vectors are perfectly suited for studying linear transformations. That’s because talking about linear transformations requires talking about scaling and adding, and…hey, that’s just exactly what vectors have!
From now on, we’ll talk about linear transformations specifically on N-dimensional vector spaces (vector spaces that have dimensions and bases we can use).
Studying linear transformationstop
From first glance, a linear transformation’s description doesn’t look too useful or analyzable. All you have is
But, actually, we can exploit its linearity and the fact that we’re in a vector space with a basis to analyze the heck out of any linear transformation, and see that all of them actually have to follow some specific pattern.
Let’s say that
Because we know that, once we pick a set of basis vectors
Hm. Doesn’t seem very insightful, does it?
A simple definitiontop
But! We can exploit the linearity of
Okay, take a moment to pause and take that all in. This is actually a pretty big deal! This just means that, to study
That is, if I were to ask you, “Hey, what is
To put in another way, any linear transformation from a three-dimensional vector space is uniquely characterized and determined by three vectors:
In general, we see that any linear transformation from an N-dimensional vector space can be completely defined by N vectors: the N results of that transformation on each of N basis vectors we choose.
Enter the Matrixtop
Okay, so how do we “give”/define/state those N vectors?
Well, recall that the result of
This means that any vector
This means that
Or, to use our bracket notation from before:
So, we now see two facts:
- A linear transformation from an N dimensional vector space to an M dimensional vector space can be defined using N vectors.
- Each of those N vectors can, themselves, be defined using M scalars each.
Our final conclusion: any linear transformation from an N dimensional vector space to an M dimensional vector space can be completely defined using
That’s right – all possible linear transformations from a 3-dimensional vector space to a 2-dimensional are parameterized by only six scalars! These six scalars uniquely determine and define our linear transformation, given a set of basis vectors that we agree on. All linear transformations
These six numbers are pretty important. Just like how we often talk about 3-dimensional vectors in terms of the encoding of their three coefficients, we often talk about linear transformations from 3-d space to 2-d space in terms of their six defining coefficients.
We group these things up in something called a matrix.
If our linear transformation
(for arbitrary choice of bases
We “encode” it as the matrix:
And that’s why we use matrices in linear algebra – like how
Exampletop
Let’s look at the vector space of polynomials, which includes vectors like
It’s an infinite-dimensional vector space, and one popular basis for this vector space is the polynomials
With this choice of basis, we can encode a polynomial like
One popular linear transformation on polynomials is the derivative,
Now, if you don’t know what a derivative was, or how to compute it – you’re in luck! What we just found out was that if you want to completely understand a linear transformation, you just need to know how it works on the basis! You just need to know what
All I need to do is tell you that
Let’s look at what this linear transformation does to each basis:
And so, in that
No calculus required! (This is the core idea of the formal derivative of a polynomial)
Matrix Operationstop
In this light, we can understand the definition of the common matrix operations.
Matrix-Vector Applicationtop
Matrix-vector application (or “multiplication”) is essentially the decoding of the linear transformation that the matrix represents.
Let’s look at the
And we say that
This means that:
Which is itself a vector in
And this is exactly the formula for matrix-vector multiplication!
Again, remember that what we are doing is manipulating specific encodings of our vectors and our linear transformations. Namely, we encode linear transformations as matrices, and vectors in their component encoding. The reason we can do these is that we agree upon a set of bases for our source and target vector spaces, and express these encodings in terms of those.
The magic we get out of this is that we can manipulate things in our “encoding world”, which correspond to things in the “real world”.
Addition of linear transformationstop
One neat thing about linear transformation is that they “add” well – you can add them together by simply applying them both and adding the results. The result is another linear transformation.
If
(Showing that it respects addition is something you can look at if you want to have some fun!)
So, if
Let’s say that, if
Then the breakdown of
Note that if we say that
Where
So, if
And that’s why we define
What’s happening here is we can represent manipulations of the functions themselves by manipulating their encodings.
And, again, the magic here is that, by manipulating things in our “encoding world”, we can make meaningful manipulations in the “real world” of linear transformations.
Symbolically, if we write function application as matrix-vector multiplication, we say that
Multiplication of linear transformationstop
We might be tempted to define multiplication of linear transformations the same way. However, this doesn’t quite make sense.
Remember that we talked about adding linear transformations as the addition of their results. However, we can’t talk about multiplying linear transformations as the multiplication of their results because the idea of a vector space doesn’t come with any notion of multiplication.
However, even if we talk specifically about linear transformations to scalars, this still doesn’t quite work:
So,
Therefore, there is no matrix that could would even represent or encode
Composition of linear transformationstop
Since linear transformations are functions, we can compose them:
Is the composition of linear transformations also a linear transformation?
Yes! (Well, once you prove that it respects addition. I’ll leave the fun to you!)
Okay, so we know that
So, let’s say that
Let’s say that
If
If you’ve taken a linear algebra class, you might recognize this pattern. Combining a
We can compute
That’s exactly what matrix multiplication is defined as. If:
is a matrix representing a linear transformation from a M-dimensional space to an O-dimensional space is an matrix representing a linear transformation from an N-dimensional space to an M-dimensional space
Then:
is a matrix representing a linear transformation from an N-dimensional space to an O-dimensional space.
Again – manipulation of our encodings can manifest the manipulation in the linear transformations that we want.
Symbolically, if we treat function application as matrix-vector multiplication, this means that
In that notation, it kinda looks like the associativity of multiplication, doesn’t it? Don’t be fooled!
If you’re familiar with Haskell idioms, matrix-matrix multiplication is like .
(function composition), and matrix-vector multiplication is like $
, or function application. One is a “higher order function”: taking two functions (at least, the encodings of them) and returning a new function. The other is an application of a function to its input.
And, like in Haskell:
. g) x = f (g x) (f
We won’t go over the actual process of computing the matrix-matrix product, but it’s something that you can work out just in terms of the definitions of the encodings. Just manually apply out everything and group together common factors of the basis vectors of the destination space.
The Big Picturetop
At the highest level here, what we’re doing is taking a function and encoding it as data – a parameterization of that function. Essentially, we take the properties of the type of functions we are looking at and find out that it can be defined/represented in a limited number of parameters
Then, the breakthrough is that we look at useful higher-order functions and manipulations of those transformations. Then, we see how we can implement those transformations by symbolically manipulating the encodings!
This is actually a dance we do all the time in programming. Instead of working with functions, we work with reified data that represent those functions. And, instead of direct higher order functions, we transform that data in a way that makes it encodes the function we want to produce.
Matrices are exactly that. Linear transformations are the functions we want to analyze, and we realize that we can completely specify/define any linear transformation with a matrix (against a choice of bases).
Then, we realize that there are some nice manipulations we can do on linear transformations; we can combine them to create new linear transformations in useful ways.
However, because those manipulations all produce new linear transformations, we know that their results can all be encoded in new matrices. So, we see if we can just directly apply those manipulations by directly working on those matrices!
I hope this post serves to demystify matrices, matrix addition, and multiplication for you, and help you see why they are defined the way that they are. Furthermore, I hope that it gives some insight on why matrices are useful in linear algebra, and also how similar encodings can help you with manipulating other types of functions!