Matrix Potpourri

Matrix Potpourri #

As part of reviewing Linear Algebra for my Machine Learning class, I’ve noticed there’s a bunch of matrix terminology that I didn’t encounter during my proof-based self-study of LA from Linear Algebra Done Right. This post is mostly intended to consolidate my own understanding and to act as a reference to future me, but if it also helps others in a similar position, that’s even better!

Note: I list all the sources from which I drew while writing this post under the “Sources” heading at the bottom of this post.

The general form of this post’s sections will be that I’ll describe an axis along which we can categorize matrices, discuss how the categorization relates to properties of linear maps, and state any interesting facts about matrices/linear maps in the category under discussion. Note that this entire discussion assumes we only care about matrices over real-valued vector spaces.

Relationship between Matrices and Linear Maps #

In order to discuss how specific types of matrices relate to linear maps, we need to first understand how matrices relate to linear maps in general! Axler covers this in more detail so I recommend referring to LADR, chapter 3 in particular, for a fuller treatment. This section assumes the reader knows what matrices and linear maps are.

For the rest of this section, assume that we have linear maps, $S$ and $T$ both with domain $V$ and range $W$. Assume $\text{dim}(V) = n$ and $\text{dim}(W) = m$ and that $v_1, v_2, \dots, v_n$ is a basis of $V$ and $w_1, w_2, \dots, w_n$ is a basis of $W$. There’s a bijection, which we’ll refer to as $M$, between matrices and linear maps, i.e. every linear map has a corresponding matrix and vice versa. A more computational way to think of this is that matrices encode linear maps’ mappings of bases in the domain to bases in the range as a 2D arrays. Using our notation, column $i$ of $ \mathcal{M}(T)$ encodes the scalar coefficients of $v_1, v_2, \dots, v_n$ that give us $Tu_i$. Further, $M(S+T) = M(S) + M(T)$ and, if we have scalar $c$, then $ \mathcal{M}(cT) = c \mathcal{M}(T) $. Thus, $\mathcal{M}$ is itself a linear map from $ \mathcal{L}(V, W) $ to $ \mathcal{F}^{n \times m} $. Last, we define matrix multiplication in the typical way and it turns out that $ \mathcal{M}(ST) = \mathcal{M}(S) \cdot \mathcal{M}(T) $.

As we know, we also want to multiply matrices by vectors. But we haven’t defined vectors as we usually think of them yet. Let’s do that now. We define $n \times 1$ vectors similar to matrices. The elements of $ \mathcal{M}(v) $ for some $v \in V$ are just the coefficiencts of the linear combination of $v_1, v_2, \dots, v_n$ that give us $v$. As a result of this, it turns out that applying a linear map is “the same” as matrix multiplication, i.e. $ \mathcal{M}(Tv) = \mathcal{M}(T) \cdot \mathcal{M}(v) $.

That’s a very brief overview of the relationship between matrices and linear maps. We’ll discuss other relationships in the following sections. For the rest of this post, I also assume the reader knows what eigenvectors and eigenvalues are.

Dimensionality #

Square Matrices #

A square matrix is a matrix with the same number of rows and columns. We often prefer to work with square matrices because they have nice properties (many of which we’ll discuss in the following sections). Technically, any linear map between two isomorphic vector spaces will have a square matrix. In practice, we typically assume that square matrices correspond to operators, i.e. linear maps over single vector spaces.

Row and Column Rank #

The row/column rank of a matrix is the number of linearly independent rows/columns. It turns out that these values are always equal, so I’ll just use the term rank to refer to both from here on out. The rank of a matrix will always equal the dimension of the range of the linear map to which the matrix corresponds. Matrix rank and dimension of a linear map’s range can be shown to be the same via the linear dependence lemma.

We refer to a matrix as “full rank” if all of its columns (and rows) are linearly independent. Since the rank of a matrix equals the dimension of its linear map’s range, by the Fundamental Theorem of Linear Maps, if matrix $M \in \mathcal{R}^{m \times n}$ is surjective if it has rank $n$ and injective if it has rank $m$. Because column and row rank are always equal, rank will always be at most $\text{min}(m, n)$. This is analogous to the fact that linear maps from smaller to larger vector spaces can never be surjective and that linear maps from larger to smaller vector spaces can never be injective.

Invertible and Singular #

Invertibility is probably the easiest concept to translate between matrix land and linear map land. A matrix is invertible if there exists some other matrix by which we can multiply it to give us the identity matrix. A linear map is invertible if there exists some other linear map which composes with it to produce the identity map. It turns that only surjective and injective linear maps invertible. Combined with the fact that surjectivity/injectivity are synonymous with full rank, this means that only full rank square matrices can have an inverse. Since non-square matrices are never invertible, we don’t have a special term for them, but we call non-invertible square matrices singular.

Symmetric #

A square matrix, $A$, is symmetric if $A = A^{T}$. For linear maps, we introduce the concept of an adjoint. Briefly, the adjoint of a linear map, $T \in \mathcal{L}(V, W)$, is the linear map, $T^{*}$ for which the inner product, $ \langle Tv, w \rangle = \langle v, T^{*}w$. An operator, $T \in \mathcal{L}(V)$ is self-adjoint if $ \langle Tu, v \rangle = \langle u, Tv \rangle $ for all $u, v \in V$. In applied linear algebra, the adjoint of a linear map’s matrix with respect to two orthonormal bases will be its transpose (technically, conjugate transpose, but this only matters for complex vector spaces). As a result, all self-adjoint operators are symmetric.

This has important ramifications due to something called the Spectral Theorem. For real vector spaces, the Spectral Theorem tells us that every self-adjoint operator has some orthonormal basis, $e$, of $V$ composed of orthonormal eigenvectors of $V$ and that $T$ has a diagonal matrix with respect to this basis. Its diagonal matrix will have eigenvalues that correspond to the orthonormal eigenvectors along its diagonal because $Te_i = \lambda_i e_i$, for all $i \in 1, \dots, n$. The Spectral Theorem also tells us the converse, that any operator with an orthonormal basis of eigenvectors is self-adjoint.

In matrix terms, the Spectral Theorem tells us that all symmetric matrices are diagonalizable. From a computational perspective, this means any symmetric matrix can be converted into a diagonal matrix via a change of basis.

Orthogonal #

An orthogonal matrix is a square matrix with mutually orthonormal columns/rows. Operators with orthogonal matrices are always isometries. An isometry is an operator that preserves norms, i.e. for $S \in \mathcal{L}(V)$, $\lVert Sv \rVert = \lVert v \rVert $. All isometries are normal and their adjoints are their inverses, i.e. $ SS^{*} = S^{*}S = I $.

As a result, the inverse of an orthogonal matrix is just its transpose. In addition, because normal operators’ eigenvectors are always mutually orthonormal, we can easily do an eigendecomposition of any symmetric matric by breaking it down into an orthogonal matrix with eigenvectors as columns and a diagonal matrix with the corresponding eigenvalues along the diagonal (there’s an analogous version of this claim for normal matrices over complex vector spaces).

Positive and Definite #

A symmetric matrix, $A$ is positive semidefinite, if, for all $x \in \mathcal{R}^n$, $x^T A x \geq 0$. A self-adjoint operator, $ T \in \mathcal{L} $, is positive if $ \langle Tv, v \rangle \geq 0 $ for all $ v \in V $. Positive definite matrices are matrices where the same expression is always strictly greater than 0. Negative definiteness is defined as expected.

As far as I know, the most important thing to remember about positive (semi-)definite matrices is that they have positive eigenvalues and are always invertible.

Potential Future Work #

I mostly focused on matrix to linear map property mappings and only glossed over stuff about eigenvalues and eigenvectors. I’m considering focusing more on eigenvalues, eigenvectors, and the SVD in a future post.

Conclusion #

Hopefully this post proves helpful for future me and potentially someone else. From a mathematician’s perspective, I’m sure the fact that I barely proved any of my statements would be horrifying, but thankfully my blog isn’t exactly widely trafficked by mathematicians. That said, I also am interested in going back and making all of these connections more formal when I have more time.

Sources #

Linear Algebra Done Right: this was the primary text from which I drew results from proof-based LA for this post.
A Programmer’s Introduction to Mathematics by Jeremy Kun: The only thing in this post that directly draws from this book is the brief mention of thinking of matrices as change of bases. That said, I definitely owe a lot to Jeremy’s treatment of Linear Algebra in that book in terms of giving me a better bridge to connect my understanding of matrix stuff to linear map stuff. Also, Jeremy’s blog is my favorite math+programming blog on the internet.
Stanford’s CS 229 Linear Algebra Review: This was the primary source from which I drew most of the matrix category names and definitions that I discuss in this post. I found this review more helpful than the one from the Deep Learning book.
Wikipedia: Honestly, what blog post doesn’t use Wikipedia in some form these days…