Matrix Algebra

Xinli Wang

2

[latexpage]

Introduction

In the study of systems of linear equations in Chapter 1, we found it convenient to manipulate the augmented matrix of the system. Our aim was to reduce it to row-echelon form (using elementary row operations) and hence to write down all solutions to the system. In the present chapter we consider matrices for their own sake. While some of the motivation comes from linear equations, it turns out that matrices can be multiplied and added and so form an algebraic system somewhat analogous to the real numbers. This ``matrix algebra'' is useful in ways that are quite different from the study of linear equations. For example, the geometrical transformations obtained by rotating the euclidean plane about the origin can be viewed as multiplications by certain $2 \times 2$ matrices. These ``matrix transformations'' are an important tool in geometry and, in turn, the geometry provides a ``picture'' of the matrices. Furthermore, matrix algebra has many other applications, some of which will be explored in this chapter. This subject is quite old and was first studied systematically in 1858 by Arthur Cayley.

Arthur Cayley (1821-1895) showed his mathematical talent early and graduated from Cambridge in 1842 as senior wrangler. With no employment in mathematics in view, he took legal training and worked as a lawyer while continuing to do mathematics, publishing nearly 300 papers in fourteen years. Finally, in 1863, he accepted the Sadlerian professorship in Cambridge and remained there for the rest of his life, valued for his administrative and teaching skills as well as for his scholarship. His mathematical achievements were of the first rank. In addition to originating matrix theory and the theory of determinants, he did fundamental work in group theory, in higher-dimensional geometry, and in the theory of invariants. He was one of the most prolific mathematicians of all time and produced 966 papers.

2.1 Matrix Addition, Scalar Multiplication, and Transposition

A rectangular array of numbers is called a matrix (the plural is matrices), and the numbers are called the entries of the matrix. Matrices are usually denoted by uppercase letters: $A$, $B$, $C$, and so on. Hence,
\begin{equation*}
A = \left[ \begin{array}{rrr}
1 & 2 & -1 \\
0 & 5 & 6
\end{array} \right] \quad
B = \left[ \begin{array}{rr}
1 & -1 \\
0 & 2
\end{array} \right] \quad
C = \left[ \begin{array}{r}
1 \\
3 \\
2
\end{array} \right]
\end{equation*}
are matrices. Clearly matrices come in various shapes depending on the number of rows and columns. For example, the matrix $A$ shown has $2$ rows and $3$ columns. In general, a matrix with $m$ rows and $n$ columns is referred to as an $\bm{m} \times \bm{n}$ matrix or as having size $\bm{m} \times \bm{n}$. Thus matrices $A$, $B$, and $C$ above have sizes $2 \times 3$, $2 \times 2$, and $3 \times 1$, respectively. A matrix of size $1 \times n$ is called a row matrix, whereas one of size $m \times 1$ is called a column matrix. Matrices of size $n \times n$ for some $n$ are called square matrices.

Each entry of a matrix is identified by the row and column in which it lies. The rows are numbered from the top down, and the columns are numbered from left to right. Then the $\bm{(i, j)}$-entry of a matrix is the number lying simultaneously in row $i$ and column $j$. For example,
\begin{align*}
\mbox{The } (1, 2) \mbox{-entry of } &\left[ \begin{array}{rr}
1 & -1 \\
0 & 1
\end{array}\right] \mbox{ is } -1. \\
\mbox{The } (2, 3) \mbox{-entry of } &\left[ \begin{array}{rrr}
1 & 2 & -1 \\
0 & 5 & 6
\end{array}\right] \mbox{ is } 6.
\end{align*}

A special notation is commonly used for the entries of a matrix. If $A$ is an $m \times n$ matrix, and if the $(i, j)$-entry of $A$ is denoted as $a_{ij}$, then $A$ is displayed as follows:
\begin{equation*}
A = \left[ \begin{array}{ccccc}
a_{11} & a_{12} & a_{13} & \cdots & a_{1n} \\
a_{21} & a_{22} & a_{23} & \cdots & a_{2n} \\
\vdots & \vdots & \vdots & & \vdots \\
a_{m1} & a_{m2} & a_{m3} & \cdots & a_{mn}
\end{array} \right]
\end{equation*}
This is usually denoted simply as $A = \left[ a_{ij} \right]$. Thus $a_{ij}$ is the entry in row $i$ and column $j$ of $A$. For example, a $3 \times 4$ matrix in this notation is written
\begin{equation*}
A = \left[ \begin{array}{cccc}
a_{11} & a_{12} & a_{13} & a_{14} \\
a_{21} & a_{22} & a_{23} & a_{24} \\
a_{31} & a_{32} & a_{33} & a_{34}
\end{array} \right]
\end{equation*}
It is worth pointing out a convention regarding rows and columns: Rows are mentioned before columns. For example:

If a matrix has size $m \times n$, it has $m$ rows and $n$ columns.
If we speak of the $(i, j)$-entry of a matrix, it lies in row $i$ and column $j$.
If an entry is denoted $a_{ij}$, the first subscript $i$ refers to the row and the second subscript $j$ to the column in which $a_{ij}$ lies.

Two points $(x_{1}, y_{1})$ and $(x_{2}, y_{2})$ in the plane are equal if and only if they have the same coordinates, that is $x_{1} = x_{2}$ and $y_{1} = y_{2}$. Similarly, two matrices $A$ and $B$ are called equal (written $A = B$) if and only if:

They have the same size.
Corresponding entries are equal.

If the entries of $A$ and $B$ are written in the form $A = \left[ a_{ij} \right]$, $B = \left[ b_{ij} \right]$, described earlier, then the second condition takes the following form:
\begin{equation*}
A = \left[ a_{ij} \right] = \left[ b_{ij} \right] \mbox{ means } a_{ij} = b_{ij} \mbox{ for all } i \mbox{ and } j
\end{equation*}

Example 2.1.1

Given $A = \left[ \begin{array}{cc}
a & b \\
c & d
\end{array}
\right]$, $B = \left[ \begin{array}{rrr}
1 & 2 & -1 \\
3 & 0 & 1
\end{array}
\right]$ and
$C = \left[ \begin{array}{rr}
1 & 0 \\
-1 & 2
\end{array}
\rightB]$
discuss the possibility that $A = B$, $B = C$, $A = C$.

Solution:

$A = B$ is impossible because $A$ and $B$ are of different sizes: $A$ is $2 \times 2$ whereas $B$ is $2 \times 3$. Similarly, $B = C$ is impossible. But $A = C$ is possible provided that corresponding entries are equal:
$\left[ \begin{array}{cc}
a & b \\
c & d
\end{array} \right] =
\left[ \begin{array}{rr}
1 & 0 \\
-1 & 2
\end{array} \right]$
means $a = 1$, $b = 0$, $c = -1$, and $d = 2$.

Matrix Addition

Definition 2.1 Matrix Addition

If $A$ and $B$ are matrices of the same size, their sum $A + B$ is the matrix formed by adding corresponding entries.

If $A = \left[ a_{ij} \right]$ and $B = \left[ b_{ij} \right]$, this takes the form
\begin{equation*}
A + B = \left[ a_{ij} + b_{ij} \right]
\end{equation*}
Note that addition is not defined for matrices of different sizes.

Example 2.1.2

If $A = \left[ \begin{array}{rrr}
2 & 1 & 3 \\
-1 & 2 & 0
\end{array} \right]$
and $B = \left[ \begin{array}{rrr}
1 & 1 & -1 \\
2 & 0 & 6
\end{array} \right]$,
compute $A + B$.

Solution:

\begin{equation*}
A + B = \left[ \begin{array}{rrr}
2 + 1 & 1 + 1 & 3 - 1 \\
-1 + 2 & 2 + 0 & 0 + 6
\end{array} \right]
=
\left[ \begin{array}{rrr}
3 & 2 & 2 \\
1 & 2 & 6
\end{array} \right]
\end{equation*}

Example 2.1.3

Find $a$, $b$, and $c$ if $\left[ \begin{array}{ccc}
a & b & c
\end{array} \right] + \left[
\begin{array}{ccc}
c & a & b
\end{array} \right]
= \left[ \begin{array}{ccc}
3 & 2 & -1
\end{array} \right]$.

Solution:

Add the matrices on the left side to obtain
\begin{equation*}
\left[ \begin{array}{ccc}
a + c & b + a & c + b
\end{array} \right]
=
\left[ \begin{array}{rrr}
3 & 2 & -1
\end{array} \right]
\end{equation*}
Because corresponding entries must be equal, this gives three equations: $a + c = 3$, $b + a = 2$, and $c + b = -1$. Solving these yields $a = 3$, $b = -1$, $c = 0$.

If $A$, $B$, and $C$ are any matrices of the same size, then
\begin{align}
A + B &= B + A \tag{commutative law} \\
A + (B + C) &= (A + B) + C \tag{associative law}
\end{align}
In fact, if $A = \left[ a_{ij} \right]$ and $B = \left[ b_{ij} \right]$, then the $(i, j)$-entries of $A + B$ and $B + A$ are, respectively, $a_{ij} + b_{ij}$ and $b_{ij} + a_{ij}$. Since these are equal for all $i$ and $j$, we get
\begin{equation*}
A + B = \left[ \begin{array}{c}
a_{ij} + b_{ij}
\end{array} \right]
=
\left[ \begin{array}{c}
b_{ij} + a_{ij}
\end{array} \right]
= B + A
\end{equation*}
The associative law is verified similarly.

The $m \times n$ matrix in which every entry is zero is called the $m \times n$ zero matrix and is denoted as $0$ (or $0_{mn}$ if it is important to emphasize the size). Hence,
\begin{equation*}
0 + X = X
\end{equation*}
holds for all $m \times n$ matrices $X$. The negative of an $m \times n$ matrix $A$ (written $-A$) is defined to be the $m \times n$ matrix obtained by multiplying each entry of $A$ by $-1$. If $A = \left[ a_{ij} \right]$, this becomes $-A = \left[ -a_{ij} \right]$. Hence,
\begin{equation*}
A + (-A) = 0
\end{equation*}
holds for all matrices $A$ where, of course, $0$ is the zero matrix of the same size as $A$.

A closely related notion is that of subtracting matrices. If $A$ and $B$ are two $m \times n$ matrices, their difference $A - B$ is defined by
\begin{equation*}
A - B = A + (-B)
\end{equation*}
Note that if $A = \left[ a_{ij} \right]$ and $B = \left[ b_{ij} \right]$, then
\begin{equation*}
A - B = \left[ a_{ij} \right] + \left[ -b_{ij} \right] = \left[ a_{ij} - b_{ij} \right]
\end{equation*}
is the $m \times n$ matrix formed by subtracting corresponding entries.

Example 2.1.4

Let $A = \left[ \begin{array}{rrr}
3 & -1 & 0 \\
1 & 2 & -4
\end{array} \right]$, $B = \left[ \begin{array}{rrr}
1 & -1 & 1 \\
-2 & 0 & 6
\end{array} \right]$, $C = \left[ \begin{array}{rrr}
1 & 0 & -2 \\
3 & 1 & 1
\end{array} \right]$.

Compute $-A$, $A - B$, and $A + B - C$.

Solution:

\begin{align*}
-A &= \left[ \begin{array}{rrr}
-3 & 1 & 0 \\
-1 & -2 & 4
\end{array} \right] \\
A - B &= \left[ \begin{array}{lcr}
3 - 1 & -1 - (-1) & 0 - 1 \\
1 - (-2) & 2 - 0 & -4 - 6
\end{array} \right] = \left[ \begin{array}{rrr}
2 & 0 & -1 \\
3 & 2 & -10
\end{array} \right] \\
A + B - C &= \left[ \begin{array}{rrl}
3 + 1 - 1 & -1 - 1 - 0 & 0 + 1 -(-2) \\
1 - 2 - 3 & 2 + 0 - 1 & -4 + 6 -1
\end{array} \right] = \left[ \begin{array}{rrr}
3 & -2 & 3 \\
-4 & 1 & 1
\end{array} \right]
\end{align*}

Example 2.1.5

Solve
$\left[ \begin{array}{rr}
3 & 2 \\
-1 & 1
\end{array} \right] + X = \left[ \begin{array}{rr}
1 & 0 \\
-1 & 2
\end{array} \right]$
where $X$ is a matrix.

We solve a numerical equation $a + x = b$ by subtracting the number $a$ from both sides to obtain $x = b - a$. This also works for matrices. To solve
$\left[ \begin{array}{rr}
3 & 2 \\
-1 & 1
\end{array} \right] + X = \left[ \begin{array}{rr}
1 & 0 \\
-1 & 2
\end{array} \right]$
simply subtract the matrix
$\left[ \begin{array}{rr}
3 & 2 \\
-1 & 1
\end{array} \right]$
from both sides to get
\begin{equation*}
X = \left[ \begin{array}{rr}
1 & 0 \\
-1 & 2
\end{array} \right]- \left[ \begin{array}{rr}
3 & 2 \\
-1 & 1
\end{array} \right] = \left[ \begin{array}{cr}
1 - 3 & 0 - 2 \\
-1 - (-1) & 2 - 1
\end{array} \right] = \left[ \begin{array}{rr}
-2 & -2 \\
0 & 1
\end{array} \right]
\end{equation*}
The reader should verify that this matrix $X$ does indeed satisfy the original equation.

The solution in Example 2.1.5 solves the single matrix equation $A + X = B$ directly via matrix subtraction: $X = B - A$. This ability to work with matrices as entities lies at the heart of matrix algebra.

It is important to note that the sizes of matrices involved in some calculations are often determined by the context. For example, if
\begin{equation*}
A + C = \left[ \begin{array}{rrr}
1 & 3 & -1 \\
2 & 0 & 1
\end{array} \right]
\end{equation*}
then $A$ and $C$ must be the same size (so that $A + C$ makes sense), and that size must be $2 \times 3$ (so that the sum is $2 \times 3$). For simplicity we shall often omit reference to such facts when they are clear from the context.

Scalar Multiplication

In gaussian elimination, multiplying a row of a matrix by a number $k$ means multiplying every entry of that row by $k$.

Definition 2.2 Matrix Scalar Multiplication

More generally, if $A$ is any matrix and $k$ is any number, the scalar multiple $kA$ is the matrix obtained from $A$ by multiplying each entry of $A$ by $k$.

The term scalar arises here because the set of numbers from which the entries are drawn is usually referred to as the set of scalars. We have been using real numbers as scalars, but we could equally well have been using complex numbers.

Example 2.1.6

If $A = \left[ \begin{array}{rrr}
3 & -1 & 4 \\
2 & 0 & 1
\end{array} \right]$
and $ B = \left[ \begin{array}{rrr}
1 & 2 & -1 \\
0 & 3 & 2
\end{array} \right],$
compute $5A$, $\frac{1}{2}B$, and $3A - 2B$.

Solution:

\begin{align*}
5A &= \left[ \begin{array}{rrr}
15 & -5 & 20 \\
10 & 0 & 30
\end{array} \right], \quad \frac{1}{2}B = \left[ \begin{array}{rrr}
\frac{1}{2} & 1 & -\frac{1}{2} \\
0 & \frac{3}{2} & 1
\end{array} \right] \\
3A - 2B &= \left[ \begin{array}{rrr}
9 & -3 & 12 \\
6 & 0 & 18
\end{array} \right] - \left[ \begin{array}{rrr}
2 & 4 & -2 \\
0 & 6 & 4
\end{array} \right] = \left[ \begin{array}{rrr}
7 & -7 & 14 \\
6 & -6 & 14
\end{array} \right]
\end{align*}

If $A$ is any matrix, note that $kA$ is the same size as $A$ for all scalars $k$. We also have
\begin{equation*}
0A = 0 \quad \mbox{ and } \quad k0 = 0
\end{equation*}
because the zero matrix has every entry zero. In other words, $kA = 0$ if either $k = 0$ or $A = 0$. The converse of this statement is also true, as Example 2.1.7 shows.

Example 2.1.7

If $kA = 0$, show that either $k = 0$ or $A = 0$.

Solution:

Write $A = \left[ a_{ij} \right]$ so that $kA = 0$ means $ka_{ij} = 0$ for all $i$ and $j$. If $k = 0$, there is nothing to do. If $k \neq 0$, then $ka_{ij} = 0$ implies that $a_{ij} = 0$ for all $i$ and $j$; that is, $A = 0$.

For future reference, the basic properties of matrix addition and scalar multiplication are listed in Theorem 2.1.1.

Theorem 2.1.1

Let $A$, $B$, and $C$ denote arbitrary $m \times n$ matrices where $m$ and $n$ are fixed. Let $k$ and $p$ denote arbitrary real numbers. Then

$A + B = B + A$.
$A + (B + C) = (A + B) + C$.
There is an $m \times n$ matrix $0$, such that $0 + A = A$ for each $A$.
For each $A$ there is an $m \times n$ matrix, $-A$, such that $A + (-A) = 0$.
$k(A + B) = kA + kB$.
$(k + p)A = kA + pA$.
$(kp)A = k(pA)$.
$1A = A$.

Proof:
Properties 1--4 were given previously. To check Property 5, let $A = \left[ a_{ij} \right]$ and $B = \left[ b_{ij} \right]$ denote matrices of the same size. Then $A + B = \left[ a_{ij} + b_{ij} \right]$, as before, so the $(i, j)$-entry of $k(A + B)$ is
\begin{equation*}
k(a_{ij} + b_{ij}) = ka_{ij} + kb_{ij}
\end{equation*}
But this is just the $(i, j)$-entry of $kA + kB$, and it follows that $k(A + B) = kA + kB$. The other Properties can be similarly verified; the details are left to the reader.

The Properties in Theorem 2.1.1 enable us to do calculations with matrices in much the same way that
numerical calculations are carried out. To begin, Property 2 implies that the sum
\begin{equation*}
(A + B) + C = A + (B + C)
\end{equation*}
is the same no matter how it is formed and so is written as $A + B + C$. Similarly, the sum
\begin{equation*}
A + B + C + D
\end{equation*}
is independent of how it is formed; for example, it equals both $(A + B) + (C + D)$ and $A + \left[ B + (C + D) \right]$. Furthermore, property 1 ensures that, for example,
\begin{equation*}
B + D + A + C = A + B + C + D
\end{equation*} In other words, the order in which the matrices are added does not matter. A similar remark applies to sums of five (or more) matrices.

Properties 5 and 6 in Theorem 2.1.1 are called distributive laws for scalar multiplication, and they extend to sums of more than two terms. For example,
\begin{equation*}
k(A + B - C) = kA + kB - kC
\end{equation*}
\begin{equation*}
(k + p - m)A = kA + pA -mA
\end{equation*}
Similar observations hold for more than three summands. These facts, together with properties 7 and 8, enable us to simplify expressions by collecting like terms, expanding, and taking common factors in exactly the same way that algebraic expressions involving variables and real numbers are manipulated. The following example illustrates these techniques.

Example 2.1.8

Simplify $2(A + 3C) - 3(2C - B) - 3 \left[ 2(2A + B - 4C) - 4(A - 2C) \right]$ where $A, B$ and $C$ are all matrices of the same size.

Solution:

The reduction proceeds as though $A$, $B$, and $C$ were variables.
\begin{align*}
2(A &+ 3C) - 3(2C - B) - 3 \left[ 2(2A + B - 4C) - 4(A - 2C) \right] \\
&= 2A + 6C - 6C + 3B - 3 \left[ 4A + 2B - 8C - 4A + 8C \right] \\
&= 2A + 3B - 3 \left[ 2B \right] \\
&= 2A - 3B
\end{align*}

Transpose of a Matrix

Many results about a matrix $A$ involve the rows of $A$, and the corresponding result for columns is derived in an analogous way, essentially by replacing the word row by the word column throughout. The following definition is made with such applications in mind.

Definition 2.3 Transpose of a Matrix

If $A$ is an $m \times n$ matrix, the transpose of $A$, written $A^{T}$, is the $n \times m$ matrix whose rows are just the columns of $A$ in the same order.

In other words, the first row of $A^{T}$ is the first column of $A$ (that is it consists of the entries of column 1 in order). Similarly the second row of $A^{T}$ is the second column of $A$, and so on.

Example 2.1.9

Write down the transpose of each of the following matrices.
\begin{equation*}
A = \left[ \begin{array}{r}
1 \\
3 \\
2
\end{array} \right] \quad
B = \left[ \begin{array}{rrr}
5 & 2 & 6
\end{array} \right] \quad
C = \left[ \begin{array}{rr}
1 & 2 \\
3 & 4 \\
5 & 6
\end{array} \right] \quad
D = \left[ \begin{array}{rrr}
3 & 1 & -1 \\
1 & 3 & 2 \\
-1 & 2 & 1
\end{array} \right]
\end{equation*}

Solution:

\begin{equation*}
A^{T} = \left[ \begin{array}{rrr}
1 & 3 & 2
\end{array} \right],\ B^{T} = \left[ \begin{array}{r}
5 \\
2 \\
6
\end{array} \right],\ C^{T} = \left[ \begin{array}{rrr}
1 & 3 & 5 \\
2 & 4 & 6
\end{array} \right], \mbox{ and } D^{T} = D.
\end{equation*}

If $A = \left[ a_{ij} \right]$ is a matrix, write $A^{T} = \left[ b_{ij} \right]$. Then $b_{ij}$ is the $j$th element of the $i$th row of $A^{T}$ and so is the $j$th element of the $i$th column of $A$. This means $b_{ij} = a_{ji}$, so the definition of $A^{T}$ can be stated as follows:
\begin{equation*}
\mbox{If } A = \left[ a_{ij} \right] \mbox{, then } A^{T} = \left[ a_{ji} \right]. \tag{2.1}

\end{equation*}
This is useful in verifying the following properties of transposition.

Theorem 2.1.2

Let $A$ and $B$ denote matrices of the same size, and let $k$ denote a scalar.

If $A$ is an $m \times n$ matrix, then $A^{T}$ is an $n \times m$ matrix.
$(A^{T})^{T} = A$.
$(kA)^{T} = kA^{T}$.
$(A + B)^{T} = A^{T} + B^{T}$.

Proof:

Property 1 is part of the definition of $A^{T}$, and Property 2 follows from (2.1). As to Property 3: If $A = \leftB a_{ij} \rightB$, then $kA = \leftB ka_{ij} \rightB$, so (2.1) gives
\begin{equation*}
(kA)^{T} = \left[ ka_{ji} \right] = k \left[ a_{ji} \right] = kA^{T}
\end{equation*}
Finally, if $B = \left[ b_{ij} \right]$, then $A + B = \left[ c_{ij} \right]$ where $c_{ij} = a_{ij} + b_{ij}$ Then (2.1) gives Property 4:
\begin{equation*}
(A + B)^{T} = \left[ c_{ij} \right]^{T} = \left[ c_{ji} \right] = \left[ a_{ji} + b_{ji} \right] = \left[ a_{ji} \right] + \left[ b_{ji} \right] = A^{T} + B^{T}
\end{equation*}

There is another useful way to think of transposition. If $A = \left[ a_{ij} \right]$ is an $m \times n$ matrix, the elements $a_{11}, a_{22}, a_{33}, \dots$ are called the main diagonal of $A$. Hence the main diagonal extends down and to the right from the upper left corner of the matrix $A$; it is shaded in the following examples:

Thus forming the transpose of a matrix $A$ can be viewed as ``flipping'' $A$ about its main diagonal, or as ``rotating'' $A$ through $180^{\circ}$ about the line containing the main diagonal. This makes Property 2 in Theorem~\ref{thm:002240} transparent.

Example 2.1.10

Solve for $A$ if $\left(2A^{T} - 3 \left[ \begin{array}{rr}
1 & 2 \\
-1 & 1
\end{array} \right] \right)^{T} = \left[ \begin{array}{rr}
2 & 3 \\
-1 & 2
\end{array} \right]$.

Solution:

Using Theorem 2.1.2, the left side of the equation is
\begin{equation*}
\left(2A^{T} - 3 \left[ \begin{array}{rr}
1 & 2 \\
-1 & 1
\end{array} \right]\right)^{T} = 2\left(A^{T}\right)^{T} - 3 \left[ \begin{array}{rr}
1 & 2 \\
-1 & 1
\end{array} \right]^{T} = 2A - 3 \left[ \begin{array}{rr}
1 & -1 \\
2 & 1
\end{array} \right]
\end{equation*}
Hence the equation becomes
\begin{equation*}
2A - 3 \left[ \begin{array}{rr}
1 & -1 \\
2 & 1
\end{array} \right] = \left[ \begin{array}{rr}
2 & 3 \\
-1 & 2
\end{array} \right]
\end{equation*}
Thus
$2A = \left[ \begin{array}{rr}
2 & 3 \\
-1 & 2
\end{array} \right] + 3 \left[ \begin{array}{rr}
1 & -1 \\
2 & 1
\end{array} \right] = \left[ \begin{array}{rr}
5 & 0 \\
5 & 5
\end{array} \right]$, so finally
$A = \frac{1}{2} \left[ \begin{array}{rr}
5 & 0 \\
5 & 5
\end{array} \right] = \frac{5}{2} \left[ \begin{array}{rr}
1 & 0 \\
1 & 1
\end{array} \right]$.

Note that Example 2.1.10 can also be solved by first transposing both sides, then solving for $A^{T}$, and so obtaining $A = (A^{T})^{T}$. The reader should do this.

The matrix $D = \left[ \begin{array}{rr}
1 & 2 \\
2 & 5 \end{array}\right]$ in Example 2.1.9 has the property that $D = D^{T}$. Such matrices are important; a matrix $A$ is called symmetric if $A = A^{T}$. A symmetric matrix $A$ is necessarily square (if $A$ is $m \times n$, then $A^{T}$ is $n \times m$, so $A = A^{T}$ forces $n = m$). The name comes from the fact that these matrices exhibit a symmetry about the main diagonal. That is, entries that are directly across the main diagonal from each other are equal.

For example, $\left[ \begin{array}{ccc}
a & b & c \\
b^\prime & d & e \\
c^\prime & e^\prime & f
\end{array} \right]$ is symmetric when $b = b^\prime$, $c = c^\prime$, and $e = e^\prime$.

Example 2.1.11

If $A$ and $B$ are symmetric $n \times n$ matrices, show that $A + B$ is symmetric.

Solution:

We have $A^{T} = A$ and $B^{T} = B$, so, by Theorem 2.1.2, we have $(A + B)^{T} = A^{T} + B^{T} = A + B$. Hence $A + B$ is symmetric.

Example 2.1.12

Suppose a square matrix $A$ satisfies $A = 2A^{T}$. Show that necessarily $A = 0$.

Solution:

If we iterate the given equation, Theorem 2.1.2 gives
\begin{equation*}
A = 2A^{T} = 2 {\left[ 2A^{T} \right]}^T = 2 \left[ 2(A^{T})^{T} \right] = 4A
\end{equation*}
Subtracting $A$ from both sides gives $3A = 0$, so $A = \frac{1}{3}(0) = 0$.

2.2 Matrix-Vector Multiplication

Up to now we have used matrices to solve systems of linear equations by manipulating the rows of the augmented matrix. In this section we introduce a different way of describing linear systems that makes more use of the coefficient matrix of the system and leads to a useful way of ``multiplying'' matrices.

Vectors

It is a well-known fact in analytic geometry that two points in the plane with coordinates $(a_{1}, a_{2})$ and $(b_{1}, b_{2})$ are equal if and only if $a_{1} = b_{1}$ and $a_{2} = b_{2}$. Moreover, a similar condition applies to points $(a_{1}, a_{2}, a_{3})$ in space. We extend this idea as follows.

An ordered sequence $(a_{1}, a_{2}, \dots, a_{n})$ of real numbers is called an ordered $\bm{n}$-tuple. The word ``ordered'' here reflects our insistence that two ordered $n$-tuples are equal if and only if corresponding entries are the same. In other words,
\begin{equation*}
(a_{1}, a_{2}, \dots, a_{n}) = (b_{1}, b_{2}, \dots, b_{n}) \quad \mbox{if and only if} \quad a_{1} = b_{1}, a_{2} = b_{2}, \dots, \mbox{ and } a_{n} = b_{n}.
\end{equation*}
Thus the ordered $2$-tuples and $3$-tuples are just the ordered pairs and triples familiar from geometry.

Definition 2.4 The set $\mathbb{R}^n$ of ordered $n$-tuples of real numbers

Let $\mathbb{R}$ denote the set of all real numbers. The set of all ordered $n$-tuples from $\mathbb{R}$ has a special notation:

\begin{equation*}
\mathbb{R}^{n} \mbox{ denotes the set of all ordered }n\mbox{-tuples of real numbers.}
\end{equation*}

There are two commonly used ways to denote the $n$-tuples in $\mathbb{R}^{n}$: As rows $(r_{1}, r_{2}, \dots, r_{n})$ or columns $\left[ \begin{array}{c}
r_{1} \\
r_{2} \\
\vdots \\
r_{n}
\end{array} \right]$;
the notation we use depends on the context. In any event they are called vectors or $n$-vectors and will be denoted using bold type such as x or v. For example, an $m \times n$ matrix $A$ will be written as a row of columns:
\begin{equation*}
A = \left[ \begin{array}{cccc}
\textbf{a}_{1} & \textbf{a}_{2} & \cdots & \textbf{a}_{n}
\end{array} \right] \mbox{ where } \textbf{a}_{j} \mbox{ denotes column } j \mbox{ of } A \mbox{ for each } j.
\end{equation*}

If $\textbf{x}$ and $\textbf{y}$ are two $n$-vectors in $\mathbf{R}^n$, it is clear that their matrix sum $\textbf{x} + \textbf{y}$ is also in $\mathbf{R}^n$ as is the scalar multiple $k\textbf{x}$ for any real number $k$. We express this observation by saying that $\mathbf{R}^n$ is closed under addition and scalar multiplication. In particular, all the basic properties in Theorem 2.1.1 are true of these $n$-vectors. These properties are fundamental and will be used frequently below without comment. As for matrices in general, the $n \times 1$ zero matrix is called the zero $\bm{n}$-vector in $\mathbf{R}^n$ and, if $\textbf{x}$ is an $n$-vector, the $n$-vector $-\textbf{x}$ is called the negative $\textbf{x}$.

Of course, we have already encountered these $n$-vectors in Section 1.3 as the solutions to systems of linear equations with $n$ variables. In particular we defined the notion of a linear combination of vectors and showed that a linear combination of solutions to a homogeneous system is again a solution. Clearly, a linear combination of $n$-vectors in $\mathbf{R}^n$ is again in $\mathbf{R}^n$, a fact that we will be using.

Matrix-Vector Multiplication

Given a system of linear equations, the left sides of the equations depend only on the coefficient matrix $A$ and the column $\textbf{x}$ of variables, and not on the constants. This observation leads to a fundamental idea in linear algebra: We view the left sides of the equations as the ``product'' $A\textbf{x}$ of the matrix $A$ and the vector $\textbf{x}$. This simple change of perspective leads to a completely new way of viewing linear systems---one that is very useful and will occupy our attention throughout this book.

To motivate the definition of the ``product'' $A\textbf{x}$, consider first the following system of two equations in three variables:
\begin{equation*}\tag{2.2}
\arraycolsep=1pt
\begin{array}{rrrrrrr}
ax_{1} & + & bx_{2} & + & cx_{3} & = & b_{1} \\
a^{\prime}x_{1} & + & b^{\prime}x_{2} & + & c^{\prime}x_{3} & = & b_{1}
\end{array}
\end{equation*}

and let $A = \left[ \begin{array}{ccc}
a & b & c \\
a^\prime & b^\prime & c^\prime
\end{array} \right]$, $\textbf{x} = \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
x_{3}
\end{array} \right]$, $\textbf{b} = \left[ \begin{array}{c}
b_{1} \\
b_{2}
\end{array} \right]$ denote the coefficient matrix, the variable matrix, and the constant matrix, respectively. The system (2.2) can be expressed as a single vector equation
\begin{equation*}
\left[ \arraycolsep=1pt \begin{array}{rrrrr}
ax_{1} & + & bx_{2} & + & cx_{3} \\
a^\prime x_{1} & + & b^\prime x_{2} & + & c^\prime x_{3}
\end{array} \right] = \left[ \begin{array}{c}
b_{1} \\
b_{2}
\end{array} \right]
\end{equation*}
which in turn can be written as follows:
\begin{equation*}
x_{1} \left[ \begin{array}{c}
a \\
a^\prime
\end{array} \right] +
x_{2} \left[ \begin{array}{c}
b \\
b^\prime
\end{array} \right] +
x_{3} \left[ \begin{array}{c}
c \\
c^\prime
\end{array} \right] = \left[ \begin{array}{c}
b_{1} \\
b_{2}
\end{array} \right]
\end{equation*}
Now observe that the vectors appearing on the left side are just the columns
\begin{equation*}
\textbf{a}_{1} = \left[ \begin{array}{c}
a \\
a^\prime
\end{array} \right],
\textbf{a}_{2} = \left[ \begin{array}{c}
b \\
b^\prime
\end{array} \right], \mbox{ and }
\textbf{a}_{3} = \left[ \begin{array}{c}
c \\
c^\prime
\end{array} \right]
\end{equation*}
of the coefficient matrix $A$. Hence the system (2.2) takes the form
\begin{equation*} \tag{2.3}
x_{1}\textbf{a}_{1} + x_{2}\textbf{a}_{2} + x_{3}\textbf{a}_{3} = \textbf{b}
\end{equation*}
This shows that the system (2.2) has a solution if and only if the constant matrix $\textbf{b}$ is a linear combination of the columns of $A$, and that in this case the entries of the solution are the coefficients $x_{1}$, $x_{2}$, and $x_{3}$ in this linear combination.

Moreover, this holds in general. If $A$ is any $m \times n$ matrix, it is often convenient to view $A$ as a row of columns. That is, if $\textbf{a}_{1}, \textbf{a}_{2}, \dots, \textbf{a}_{n}$ are the columns of $A$, we write
\begin{equation*}
A = \left[ \begin{array}{cccc}
\textbf{a}_{1} & \textbf{a}_{2} & \cdots & \textbf{a}_{n}
\end{array} \right]
\end{equation*}
and say that $A = \left[ \begin{array}{cccc}\textbf{a}_{1} & \textbf{a}_{2} & \cdots & \textbf{a}_{n}
\end{array} \right]$ is given in terms of its columns.

Now consider any system of linear equations with $m \times n$ coefficient matrix $A$. If $\textbf{b}$ is the constant matrix of the system, and if $\textbf{x} = \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
\vdots \\
x_{n}
\end{array} \right]$
is the matrix of variables then, exactly as above, the system can be written as a single vector equation
\begin{equation*} \tag{2.4}
x_{1}\textbf{a}_{1} + x_{2}\textbf{a}_{2} + \dots + x_{n}\textbf{a}_{n} = \textbf{b}
\end{equation*}

Example 2.2.1

Write the system
$\left\lbrace
\arraycolsep=1pt
\begin{array}{rrrrrrr}
3x_{1} & + & 2x_{2} & - & 4x_{3} & = & 0 \\
x_{1} & - & 3x_{2} & + & x_{3} & = & 3 \\
& & x_{2} & - & 5x_{3} & = & -1
\end{array} \right.$
in the form given in (2.4).

Solution:

\begin{equation*}
x_{1} \left[ \begin{array}{r}
3 \\
1 \\
0
\end{array} \right] +
x_{2} \left[ \begin{array}{r}
2 \\
-3 \\
1
\end{array} \right] +
x_{3} \left[ \begin{array}{r}
-4 \\
1 \\
-5
\end{array} \right] =
\left[ \begin{array}{r}
0 \\
3 \\
-1
\end{array} \right]
\end{equation*}

As mentioned above, we view the left side of (2.4) as the product of the matrix $A$ and the vector $\textbf{x}$. This basic idea is formalized in the following definition:

Definition 2.5 Matrix-Vector Multiplication

Let $A = \left[ \begin{array}{cccc}
\textbf{a}_{1} &\textbf{a}_{2} & \cdots & \textbf{a}_{n}
\end{array} \right]$ be an $m \times n$ matrix, written in terms of its columns $\textbf{a}_{1}, \textbf{a}_{2}, \dots, \textbf{a}_{n}$. If $\textbf{x} = \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
\vdots \\
x_{n}
\end{array} \right]$
is any n-vector, the product $A\textbf{x}$ is defined to be the $m$-vector given by:
\begin{equation*}
A\textbf{x} = x_{1}\textbf{a}_{1} + x_{2}\textbf{a}_{2} + \cdots + x_{n}\textbf{a}_{n}
\end{equation*}

In other words, if $A$ is $m \times n$ and $\textbf{x}$ is an $n$-vector, the product $A\textbf{x}$ is the linear combination of the columns of $A$ where the coefficients are the entries of $\textbf{x}$ (in order).

Note that if $A$ is an $m \times n$ matrix, the product $A\textbf{x}$ is only defined if $\textbf{x}$ is an $n$-vector and then the vector $A\textbf{x}$ is an $m$-vector because this is true of each column $\textbf{a}_{j}$ of $A$. But in this case the system of linear equations with coefficient matrix $A$ and constant vector $\textbf{b}$ takes the form of a single matrix equation
\begin{equation*}
A\textbf{x} = \textbf{b}
\end{equation*}
The following theorem combines Definition 2.5 and equation (2.4) and summarizes the above discussion. Recall that a system of linear equations is said to be consistent if it has at least one solution.

Theorem 2.2.1

Every system of linear equations has the form $A\textbf{x} = \textbf{b}$ where $A$ is the coefficient matrix, $\textbf{b}$ is the constant matrix, and $\textbf{x}$ is the matrix of variables.
The system $A\textbf{x} = \textbf{b}$ is consistent if and only if $\textbf{b}$ is a linear combination of the columns of $A$.
If $\textbf{a}_{1}, \textbf{a}_{2}, \dots, \textbf{a}_{n}$ are the columns of $A$ and if $\textbf{x} = \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
\vdots \\
x_{n}
\end{array} \right]$, then $\textbf{x}$ is a solution to the linear system $A\textbf{x} = \textbf{b}$ if and only if $x_{1}, x_{2}, \dots, x_{n}$ are a solution of the vector equation
\begin{equation*}
x_{1}\textbf{a}_{1} + x_{2}\textbf{a}_{2} + \cdots + x_{n}\textbf{a}_{n} = \textbf{b} \end{equation*}

A system of linear equations in the form $A\textbf{x} = \textbf{b}$ as in (1) of Theorem 2.2.1 is said to be written in matrix form. This is a useful way to view linear systems as we shall see.

Theorem 2.2.1 transforms the problem of solving the linear system $A\textbf{x} = \textbf{b}$ into the problem of expressing the constant matrix $B$ as a linear combination of the columns of the coefficient matrix $A$. Such a change in perspective is very useful because one approach or the other may be better in a particular situation; the importance of the theorem is that there is a choice.

Example 2.2.2

If $A = \left[ \begin{array}{rrrr}
2 & -1 & 3 & 5 \\
0 & 2 & -3 & 1 \\
-3 & 4 & 1 & 2
\end{array} \right]$ and
$\textbf{x} = \left[ \begin{array}{r}
2 \\
1 \\
0 \\
-2
\end{array} \right]$, compute $A\textbf{x}$.

Solution:

By Definition 2.5:
$A\textbf{x} = 2 \left[ \begin{array}{r}
2 \\
0 \\
-3
\end{array} \right] + 1
\left[ \begin{array}{r}
-1 \\
2 \\
4
\end{array} \right] + 0
\left[ \begin{array}{r}
3 \\
-3 \\
1
\end{array} \right] - 2
\left[ \begin{array}{r}
5 \\
1 \\
2
\end{array} \right] =
\left[ \begin{array}{r}
-7 \\
0 \\
-6
\end{array} \right]$.

Example 2.2.3

Given columns $\textbf{a}_{1}$, $\textbf{a}_{2}$, $\textbf{a}_{3}$, and $\textbf{a}_{4}$ in $\mathbf{R}^3$, write $2\textbf{a}_{1} - 3\textbf{a}_{2} + 5\textbf{a}_{3} + \textbf{a}_{4}$ in the form $A\textbf{x}$ where $A$ is a matrix and $\textbf{x}$ is a vector.

Solution:

Here the column of coefficients is
$
\textbf{x} = \left[ \begin{array}{r}
2 \\
-3 \\
5 \\
1
\end{array} \right]. $
Hence Definition 2.5 gives
\begin{equation*}
A\textbf{x} = 2\textbf{a}_{1} - 3\textbf{a}_{2} + 5\textbf{a}_{3} + \textbf{a}_{4}
\end{equation*}
where $A = \left[ \begin{array}{cccc}
\textbf{a}_{1} & \textbf{a}_{2} & \textbf{a}_{3} & \textbf{a}_{4}
\end{array} \right]$ is the matrix with $\textbf{a}_{1}$, $\textbf{a}_{2}$, $\textbf{a}_{3}$, and $\textbf{a}_{4}$ as its columns.

Example 2.2.4

Let $A = \left[ \begin{array}{cccc}
\textbf{a}_{1} & \textbf{a}_{2} & \textbf{a}_{3} & \textbf{a}_{4}
\end{array} \right]$ be the $3 \times 4$ matrix given in terms of its columns
$\textbf{a}_{1} = \left[ \begin{array}{r}
2 \\
0 \\
-1
\end{array} \right]$, $\textbf{a}_{2} = \left[ \begin{array}{r}
1 \\
1 \\
1
\end{array} \right]$, $\textbf{a}_{3} = \left[ \begin{array}{r}
3 \\
-1 \\
-3
\end{array} \right]$, and $\textbf{a}_{4} = \left[ \begin{array}{r}
3 \\
1 \\
0
\end{array} \right]$.
In each case below, either express $\textbf{b}$ as a linear combination of $\textbf{a}_{1}$, $\textbf{a}_{2}$, $\textbf{a}_{3}$, and $\textbf{a}_{4}$, or show that it is not such a linear combination. Explain what your answer means for the corresponding system $A\textbf{x} = \textbf{b}$ of linear equations.

1. $\textbf{b} = \left[ \begin{array}{r}
1 \\
2 \\
3
\end{array} \right] $

2. $\textbf{b} = \left[ \begin{array}{r}
4 \\
2 \\
1
\end{array} \right]$

Solution:

By Theorem 2.2.1, $\textbf{b}$ is a linear combination of $\textbf{a}_{1}$, $\textbf{a}_{2}$, $\textbf{a}_{3}$, and $\textbf{a}_{4}$ if and only if the system $A\textbf{x} = \textbf{b}$ is consistent (that is, it has a solution). So in each case we carry the augmented matrix $\left[ A|\textbft{b} \right]$ of the system $A\textbf{x} = \textbf{b}$ to reduced form.

1. Here
$
\left[ \begin{array}{rrrr|r}
2 & 1 & 3 & 3 & 1 \\
0 & 1 & -1 & 1 & 2 \\
-1 & 1 & -3 & 0 & 3
\end{array} \right]
\rightarrow
\left[ \begin{array}{rrrr|r}
1 & 0 & 2 & 1 & 0 \\
0 & 1 & -1 & 1 & 0 \\
0 & 0 & 0 & 0 & 1
\end{array} \right]$, so the system $A\textbf{x} = \textbf{b}$ has no solution in this case. Hence $\textbf{b}$ is \textit{not} a linear combination of $\textbf{a}_{1}$, $\textbf{a}_{2}$, $\textbf{a}_{3}$, and $\textbf{a}_{4}$.

2. Now
$
\left[ \begin{array}{rrrr|r}
2 & 1 & 3 & 3 & 4 \\
0 & 1 & -1 & 1 & 2 \\
-1 & 1 & -3 & 0 & 1
\end{array} \right]
\rightarrow
\left[ \begin{array}{rrrr|r}
1 & 0 & 2 & 1 & 1 \\
0 & 1 & -1 & 1 & 2 \\
0 & 0 & 0 & 0 & 0
\end{array} \right]$, so the system $A\textbf{x} = \textbf{b}$ is consistent.

Thus $\textbf{b}$ is a linear combination of $\textbf{a}_{1}$, $\textbf{a}_{2}$, $\textbf{a}_{3}$, and $\textbf{a}_{4}$ in this case. In fact the general solution is $x_{1} = 1 - 2s - t$, $x_{2} = 2 + s - t$, $x_{3} = s$, and $x_{4} = t$ where $s$ and $t$ are arbitrary parameters. Hence $x_{1}\textbf{a}_{1} + x_{2}\textbf{a}_{2} + x_{3}\textbf{a}_{3} + x_{4}\textbf{a}_{4} = \textbf{b} = \left[ \begin{array}{r}
4 \\
2 \\
1
\end{array} \right]$
for any choice of $s$ and $t$. If we take $s = 0$ and $t = 0$, this becomes $\textbf{a}_{1} + 2\textbf{a}_{2} = \textbf{b}$, whereas taking $s = 1 = t$ gives $-2\textbf{a}_{1} + 2\textbf{a}_{2} + \textbf{a}_{3} + \textbf{a}_{4} = \textbf{b}$.

Example 2.2.5

Taking $A$ to be the zero matrix, we have $0\textbf{x} = \textbf{0}$ for all vectors $\textbf{x}$ by Definition 2.5 because every column of the zero matrix is zero. Similarly, $A\textbf{0} = \textbf{0}$ for all matrices $A$ because every entry of the zero vector is zero.

Example 2.2.6

If $I = \left[ \begin{array}{rrr}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{array} \right]$, show that $I\textbf{x} = \textbf{x}$ for any vector $\textbf{x}$ in $\mathbf{R}^3$.

Solution:

If $\textbf{x} = \left[ \begin{array}{r}
x_{1} \\
x_{2} \\
x_{3}
\end{array} \right],$ then Definition 2.5 gives
\begin{equation*}
I\textbf{x} = x_{1} \left[ \begin{array}{r}
1 \\
0 \\
0 \\
\end{array} \right] +
x_{2} \left[ \begin{array}{r}
0 \\
1 \\
0 \\
\end{array} \right] +
x_{3} \left[ \begin{array}{r}
0 \\
0 \\
1 \\
\end{array} \right] =
\left[ \begin{array}{r}
x_{1} \\
0 \\
0 \\
\end{array} \right] +
\left[ \begin{array}{r}
0 \\
x_{2} \\
0 \\
\end{array} \right] +
\left[ \begin{array}{r}
0 \\
0 \\
x_{3} \\
\end{array} \right] =
\left[ \begin{array}{r}
x_{1} \\
x_{2} \\
x_{3} \\
\end{array} \right] = \textbf{x}
\end{equation*}

The matrix $I$ in Example 2.2.6 is called the $3 \times 3$ identity matrix, and we will encounter such matrices again in future. Before proceeding, we develop some algebraic properties of matrix-vector multiplication that are used extensively throughout linear algebra.

Theorem 2.2.2

Let $A$ and $B$ be $m \times n$ matrices, and let $\vect{x}$ and $\vect{y}$ be $n$-vectors in $\RR^n$. Then:

$A(\textbf{x} + \textbf{y}) = A\textbf{x} + A\textbf{y}$.
$A(a\textbf{x}) = a(A\textbf{x}) = (aA)\textbf{x}$ for all scalars $a$.
$(A + B)\textbf{x} = A\textbf{x} + B\textbf{x}$.

Proof:

We prove (3); the other verifications are similar and are left as exercises. Let $A = \left[ \begin{array}{cccc}
\textbf{a}_{1} & \textbf{a}_{2} & \cdots & \textbf{a}_{n}
\end{array} \right]$ and $B = \left[ \begin{array}{cccc}
\textbf{b}_{1} & \textbf{b}_{2} & \cdots & \textbf{b}_{n}
\end{array} \right]$ be given in terms of their columns. Since adding two matrices is the same as adding their columns, we have
\begin{equation*}
A + B = \left[ \begin{array}{cccc}
\textbf{a}_{1} + \textbf{b}_{1} & \textbf{a}_{2} + \textbf{b}_{2} & \cdots & \textbf{a}_{n} + \textbf{b}_{n}
\end{array} \right]
\end{equation*}
If we write $\textbf{x} = \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
\vdots \\
x_{n}
\end{array} \right].$
Definition 2.5 gives
\begin{align*}
(A + B)\textbf{x} &= x_{1}(\textbf{a}_{1} + \textbf{b}_{1}) + x_{2}(\textbf{a}_{2} + \textbf{b}_{2}) + \dots + x_{n}(\textbf{a}_{n} + \textbf{b}_{n}) \\
&= (x_{1}\textbf{a}_{1} + x_{2}\textbf{a}_{2} + \dots + x_{n}\textbf{a}_{n}) + (x_{1}\textbf{b}_{1} + x_{2}\textbf{b}_{2} + \dots + x_{n}\textbf{b}_{n})\\
&= A\textbf{x} + B\textbf{x}
\end{align*}

Theorem 2.2.2 allows matrix-vector computations to be carried out much as in ordinary arithmetic. For example, for any $m \times n$ matrices $A$ and $B$ and any $n$-vectors $\textbf{x}$ and $\textbf{y}$, we have:
\begin{equation*}
A(2 \textbf{x} - 5 \textbf{y}) = 2A\textbf{x} - 5A\textbf{y} \quad \mbox{ and } \quad (3A -7B)\textbf{x} = 3A\textbf{x} - 7B\textbf{x}
\end{equation*}
We will use such manipulations throughout the book, often without mention.

Linear Equations

Theorem 2.2.2 also gives a useful way to describe the solutions to a system
\begin{equation*}
A\textbf{x} = \textbf{b}
\end{equation*}
of linear equations. There is a related system
\begin{equation*}
A\textbf{x} = \textbf{0}
\end{equation*}
called the associated homogeneous system, obtained from the original system $A\textbf{x} = \textbf{b}$ by replacing all the constants by zeros. Suppose $\textbf{x}_{1}$ is a solution to $A\textbf{x} = \textbf{b}$ and $\textbf{x}_{0}$ is a solution to $A\textbf{x} = \textbf{0}$ (that is $A\textbf{x}_{1} = \textbf{b}$ and $A\textbf{x}_{0} = \textbf{0}$). Then $\textbf{x}_{1} + \textbf{x}_{0}$ is another solution to $A\textbf{x} = \textbf{b}$. Indeed, Theorem 2.2.2 gives
\begin{equation*}
A(\textbf{x}_{1} + \textbf{x}_{0}) = A\textbf{x}_{1} + A\textbf{x}_{0} = \textbf{b} + \textbf{0} = \textbf{b}
\end{equation*}
This observation has a useful converse.

Theorem 2.2.3

Suppose $\textbf{x}_{1}$ is any particular solution to the system $A\textbf{x} = \textbf{b}$ of linear equations. Then every solution $\textbf{x}_{2}$ to $A\textbf{x} = \textbf{b}$ has the form
\begin{equation*}
\textbf{x}_{2} = \textbf{x}_{0} + \textbf{x}_{1}
\end{equation*}
for some solution $\textbf{x}_{0}$ of the associated homogeneous system $A\textbf{x} = \textbf{0}$.

Proof:
Suppose $\textbf{x}_{2}$ is also a solution to $A\textbf{x} = \textbf{b}$, so that $A\textbf{x}_{2} = \textbf{b}$. Write $\textbf{x}_{0} = \textbf{x}_{2} - \textbf{x}_{1}$. Then $\textbf{x}_{2} = \textbf{x}_{0} + \textbf{x}_{1}$ and, using Theorem 2.2.2, we compute
\begin{equation*}
A\textbf{x}_{0} = A(\textbf{x}_{2} - \textbf{x}_{1}) = A\textbf{x}_{2} - A\textbf{x}_{1} = \textbf{b} - \textbf{b} = \textbf{0}
\end{equation*}
Hence $\textbf{x}_{0}$ is a solution to the associated homogeneous system $A\textbf{x} = \textbf{0}$.

Note that gaussian elimination provides one such representation.

Example 2.2.7

Express every solution to the following system as the sum of a specific solution plus a solution to the associated homogeneous system.
\begin{equation*}
\arraycolsep=1pt
\begin{array}{rrrrrrrrr}
x_{1} & - & x_{2} & - & x_{3} & + & 3x_{4} & = & 2 \\
2x_{1} & - & x_{2} & - & 3x_{3}& + & 4x_{4} & = & 6 \\
x_{1} & & & - & 2x_{3} & + & x_{4} & = & 4
\end{array}
\end{equation*}

Solution:

Gaussian elimination gives $x_{1} = 4 + 2s - t$, $x_{2} = 2 + s + 2t$, $x_{3} = s$, and $x_{4} = t$ where $s$ and $t$ are arbitrary parameters. Hence the general solution can be written
\begin{equation*}
\textbf{x} = \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
x_{3} \\
x_{4}
\end{array} \right] =
\left[ \begin{array}{c}
4 + 2s - t \\
2 + s + 2t \\
s \\
t
\end{array} \right] =
\left[ \begin{array}{r}
4 \\
2 \\
0 \\
0
\end{array} \right] +
\left( s \left[ \begin{array}{r}
2 \\
1 \\
1 \\
0
\end{array} \right] + t \left[ \begin{array}{r}
-1 \\
2 \\
0 \\
1
\end{array} \right] \right)
\end{equation*}
Thus
$\textbf{x}_1 = \left[ \begin{array}{r}
4 \\
2 \\
0 \\
0
\end{array} \right]$
is a particular solution (where $s = 0 = t$), and
$
\textbf{x}_{0} = s \left[ \begin{array}{r}
2 \\
1 \\
1 \\
0
\end{array} \right] + t \left[ \begin{array}{r}
-1 \\
2 \\
0 \\
1
\end{array} \right]$ gives all solutions to the associated homogeneous system. (To see why this is so, carry out the gaussian elimination again but with all the constants set equal to zero.)

The following useful result is included with no proof.

Theorem 2.2.4

Let $A\textbf{x} = \textbf{b}$ be a system of equations with augmented matrix $\left[ \begin{array}{c|c} A & \textbf{b}
\end{array}\right]$. Write $\text{rank} A = r$.

1. $\text{rank} \left[ \begin{array}{c|c} A & \textbf{b}
\end{array}\right]$ is either $r$ or $r+1$.

2. The system is consistent if and only if $\text{rank} \left[ \begin{array}{c|c} A & \textbf{b}
\end{array}\right] = r$.

3. The system is inconsistent if and only if $\text{rank} \left[ \begin{array}{c|c} A & \textbf{b}
\end{array}\right] = r+1$.

The Dot Product

Definition 2.5 is not always the easiest way to compute a matrix-vector product $A\textbf{x}$ because it requires that the columns of $A$ be explicitly identified. There is another way to find such a product which uses the matrix $A$ as a whole with no reference to its columns, and hence is useful in practice. The method depends on the following notion.

Definition 2.6 Dot Product in $\mathbb{R}^n$

If $(a_{1}, a_{2}, \dots, a_{n})$ and $(b_{1}, b_{2}, \dots, b_{n})$ are two ordered $n$-tuples, their $\textbf{dot product}$ is defined to be the number
\begin{equation*}
a_{1}b_{1} + a_{2}b_{2} + \dots + a_{n}b_{n}
\end{equation*}
obtained by multiplying corresponding entries and adding the results.

To see how this relates to matrix products, let $A$ denote a $3 \times 4$ matrix and let $\textbf{x}$ be a $4$-vector. Writing
\begin{equation*}
\textbf{x} = \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
x_{3} \\
x_{4}
\end{array} \right] \quad \mbox{ and } \quad
A = \left[ \begin{array}{cccc}
a_{11} & a_{12} & a_{13} & a_{14} \\
a_{21} & a_{22} & a_{23} & a_{24} \\
a_{31} & a_{32} & a_{33} & a_{34}
\end{array} \right]
\end{equation*}
in the notation of Section 2.1, we compute
\begin{align*}
A\textbf{x} = \left[ \begin{array}{cccc}
a_{11} & a_{12} & a_{13} & a_{14} \\
a_{21} & a_{22} & a_{23} & a_{24} \\
a_{31} & a_{32} & a_{33} & a_{34}
\end{array} \right] \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
x_{3} \\
x_{4}
\end{array} \right] &= x_{1} \left[ \begin{array}{c}
a_{11} \\
a_{21} \\
a_{31}
\end{array} \right] + x_{2} \left[ \begin{array}{c}
a_{12} \\
a_{22} \\
a_{32}
\end{array} \right] + x_{3} \left[ \begin{array}{c}
a_{13} \\
a_{23} \\
a_{33}
\end{array} \right] + x_{4} \left[ \begin{array}{c}
a_{14} \\
a_{24} \\
a_{34}
\end{array} \right] \\
&= \left[ \begin{array}{c}
a_{11}x_{1} + a_{12}x_{2} + a_{13}x_{3} + a_{14}x_{4} \\
a_{21}x_{1} + a_{22}x_{2} + a_{23}x_{3} + a_{24}x_{4} \\
a_{31}x_{1} + a_{32}x_{2} + a_{33}x_{3} + a_{34}x_{4}
\end{array} \right]
\end{align*}
From this we see that each entry of $A\textbf{x}$ is the dot product of the corresponding row of $A$ with $\textbf{x}$. This computation goes through in general, and we record the result in Theorem 2.2.5.

Theorem 2.2.5 Dot Product Rule

Let $A$ be an $m \times n$ matrix and let $\textbf{x}$ be an $n$-vector. Then each entry of the vector $A\textbf{x}$ is the dot product of the corresponding row of $A$ with $\textbf{x}$.

This result is used extensively throughout linear algebra.

If $A$ is $m \times n$ and $\textbf{x}$ is an $n$-vector, the computation of $A\textbf{x}$ by the dot product rule is simpler than using Definition 2.5 because the computation can be carried out directly with no explicit reference to the columns of $A$ (as in Definition 2.5. The first entry of $A\textbf{x}$ is the dot product of row 1 of $A$ with $\textbf{x}$. In hand calculations this is computed by going across row one of $A$, going down the column $\textbf{x}$, multiplying corresponding entries, and adding the results. The other entries of $A\textbf{x}$ are computed in the same way using the other rows of $A$ with the column $\textbf{x}$.

In general, compute entry $i$ of $A\textbf{x}$ as follows (see the diagram):

Go across row $i$ of $A$ and down column $\textbf{x}$, multiply corresponding entries, and add the results.

As an illustration, we rework Example 2.2.2 using the dot product rule instead of Definition 2.5.

Example 2.2.8

If $A = \left[ \begin{array}{rrrr}
2 & -1 & 3 & 5 \\
0 & 2 & -3 & 1 \\
-3 & 4 & 1 & 2
\end{array} \right]$
and $\textbf{x} = \left[ \begin{array}{r}
2 \\
1 \\
0 \\
-2
\end{array} \right]$, compute $A\textbf{x}$.

Solution:
The entries of $A\textbf{x}$ are the dot products of the rows of $A$ with $\textbf{x}$:
\begin{equation*}
A\textbf{x} = \left[ \begin{array}{rrrr}
2 & -1 & 3 & 5 \\
0 & 2 & -3 & 1 \\
-3 & 4 & 1 & 2
\end{array} \right] \left[ \begin{array}{r}
2 \\
1 \\
0 \\
-2
\end{array} \right] = \left[ \begin{array}{rrrrrrr}
2 \cdot 2 & + & (-1)1 & + & 3 \cdot 0 & + & 5(-2) \\
0 \cdot 2 & + & 2 \cdot 1 & + & (-3)0 & + & 1(-2) \\
(-3)2 & + & 4 \cdot 1 & + & 1 \cdot 0 & + & 2(-2)
\end{array} \right] = \left[ \begin{array}{r}
-7 \\
0 \\
-6
\end{array} \right]
\end{equation*}
Of course, this agrees with the outcome in Example 2.2.2.

Example 2.2.9

Write the following system of linear equations in the form $A\textbf{x} = \textbf{b}$.
\begin{equation*}
\arraycolsep=1pt
\begin{array}{rrrrrrrrrrr}
5x_{1} & - & x_{2} & + & 2x_{3} & + & x_{4} & - & 3x_{5} & = & 8 \\
x_{1} & + & x_{2} & + & 3x_{3} & - & 5x_{4} & + & 2x_{5} & = & -2 \\
-x_{1} & + & x_{2} & - & 2x_{3} & + & & - & 3x_{5} & = & 0
\end{array}
\end{equation*}

Solution:

Write $A = \left[ \begin{array}{rrrrr}
5 & -1 & 2 & 1 & -3 \\
1 & 1 & 3 & -5 & 2 \\
-1 & 1 & -2 & 0 & -3
\end{array} \right]$, $\textbf{b} = \left[ \begin{array}{r}
8 \\
-2 \\
0
\end{array} \right]$, and $\textbf{x} = \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
x_{3} \\
x_{4} \\
x_{5}
\end{array} \right]$. Then the dot product rule gives $A\textbf{x} = \left[ \arraycolsep=1pt \begin{array}{rrrrrrrrr}
5x_{1} & - & x_{2} & + & 2x_{3} & + & x_{4} & - & 3x_{5} \\
x_{1} & + & x_{2} & + & 3x_{3} & - & 5x_{4} & + & 2x_{5} \\
-x_{1} & + & x_{2} & - & 2x_{3} & & & - & 3x_{5}
\end{array} \right]$, so the entries of $A\textbf{x}$ are the left sides of the equations in the linear system. Hence the system becomes $A\textbf{x} = \textbf{b}$ because matrices are equal if and only corresponding entries are equal.

Example 2.2.10

If $A$ is the zero $m \times n$ matrix, then $A\textbf{x} = \textbf{0}$ for each $n$-vector $\textbf{x}$.

Solution:

For each $k$, entry $k$ of $A\textbf{x}$ is the dot product of row $k$ of $A$ with $\textbf{x}$, and this is zero because row $k$ of $A$ consists of zeros.

Definition 2.7 The Identity Matrix

For each $n > 2$, the $\textbf{identity matrix}$ $I_{n}$ is the $n \times n$ matrix with 1s on the main diagonal (upper left to lower right), and zeros elsewhere.

The first few identity matrices are
\begin{equation*}
I_{2} = \left[ \begin{array}{rr}
1 & 0 \\
0 & 1
\end{array} \right], \quad I_{3} = \left[ \begin{array}{rrr}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{array} \right], \quad I_{4} = \left[ \begin{array}{rrrr}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1
\end{array} \right], \quad \dots
\end{equation*}
In Example 2.2.6 we showed that $I_{3}\textbf{x} = \textbf{x}$ for each $3$-vector $\textbf{x}$ using Definition 2.5. The following result shows that this holds in general, and is the reason for the name.

Example 2.2.11

For each $n \geq 2$ we have $I_{n}\textbf{x} = \textbf{x}$ for each $n$-vector $\textbf{x}$ in $\mathbf{R}^n$.

Solution:

We verify the case $n = 4$. Given the $4$-vector $\textbf{x} = \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
x_{3} \\
x_{4}
\end{array} \right]$
the dot product rule gives
\begin{equation*}
I_{4}\textbf{x} = \left[\begin{array}{rrrr}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1
\end{array} \right] \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
x_{3} \\
x_{4}
\end{array} \right] = \left[ \begin{array}{c}
x_{1} + 0 + 0 + 0 \\
0 + x_{2} + 0 + 0 \\
0 + 0 + x_{3} + 0 \\
0 + 0 + 0 + x_{4}
\end{array} \right] = \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
x_{3} \\
x_{4}
\end{array} \right] = \vect{x}
\end{equation*}
In general, $I_{n}\textbf{x} = \textbf{x}$ because entry $k$ of $I_{n}\textbf{x}$ is the dot product of row $k$ of $I_{n}$ with $\textbf{x}$, and row $k$ of $I_{n}$ has $1$ in position $k$ and zeros elsewhere.

Example 2.2.12

Let $A = \left[ \begin{array}{cccc}
\textbf{a}_{1} & \textbf{a}_{2} & \cdots & \textbf{a}_{n}
\end{array} \right]$ be any $m \times n$ matrix with columns $\textbf{a}_{1}, \textbf{a}_{2}, \dots, \textbf{a}_{n}$. If $\textbf{e}_{j}$ denotes column $j$ of the $n \times n$ identity matrix $I_{n}$, then $A\textbf{e}_{j} = \textbf{a}_{j}$ for each $j = 1, 2, \dots, n$.

Solution:

Write $\textbf{e}_{j} = \left[ \begin{array}{c}
t_{1} \\
t_{2} \\
\vdots \\
t_{n}
\end{array} \right]$
where $t_{j} = 1$, but $t_{i} = 0$ for all $i \neq j$. Then Theorem 2.2.5 gives
\begin{equation*}
A\textbf{e}_{j} = t_{1}\textbf{a}_{1} + \dots + t_{j}\textbf{a}_{j} + \dots + t_{n}\textbf{a}_{n} = 0 + \dots + \textbf{a}_{j} + \dots + 0 = \textbf{a}_{j}
\end{equation*}

Example 2.2.12will be referred to later; for now we use it to prove:

Theorem 2.2.6

Let $A$ and $B$ be $m \times n$ matrices. If $A\textbf{x} = B\textbf{x}$ for all $\textbf{x}$ in $\mathbf{R}^n$, then $A = B$.

Proof:

Write $A = \left[ \begin{array}{cccc}
\textbf{a}_{1} & \textbf{a}_{2} & \cdots & \textbf{a}_{n}
\end{array} \right]$ and $B = \left[ \begin{array}{cccc}
\textbf{b}_{1} & \textbf{b}_{2} & \cdots & \textbf{b}_{n}
\end{array} \right]$ and in terms of their columns. It is enough to show that $\textbf{a}_{k} = \textbf{b}_{k}$ holds for all $k$. But we are assuming that $A\textbf{e}_{k} = B\textbf{e}_{k}$, which gives $\textbf{a}_{k} = \textbf{b}_{k}$ by Example 2.2.12.

We have introduced matrix-vector multiplication as a new way to think about systems of linear equations. But it has several other uses as well. It turns out that many geometric operations can be described using matrix multiplication, and we now investigate how this happens. As a bonus, this description provides a geometric ``picture'' of a matrix by revealing the effect on a vector when it is multiplied by $A$. This ``geometric view'' of matrices is a fundamental tool in understanding them.

2.3 Matrix Multiplication

In Section 2.2 matrix-vector products were introduced. If $A$ is an $m \times n$ matrix, the product $A\textbf{x}$ was defined for any $n$-column $\vect{x}$ in $\mathbf{R}^n$ as follows: If $A = \left[ \begin{array}{cccc}
\textbf{a}_{1} & \textbf{a}_{2} & \cdots & \textbf{a}_{n}
\end{array} \right]$ where the $\textbf{a}_{j}$ are the columns of $A$, and if $\textbf{x} = \left[ \begin{array}{c}
x_{1} \\
x_{2} \\
\vdots \\
x_{n}
\end{array} \right]$,
Definition 2.5 reads
\begin{equation*} \tag{2.5}
A\textbf{x} = x_{1}\textbf{a}_{1} + x_{2}\textbf{a}_{2} + \cdots + x_{n}\textbf{a}_{n}
\end{equation*}
This was motivated as a way of describing systems of linear equations with coefficient matrix $A$. Indeed every such system has the form $A\textbf{x} = \textbf{b}$ where $\textbf{b}$ is the column of constants.

In this section we extend this matrix-vector multiplication to a way of multiplying matrices in general, and then investigate matrix algebra for its own sake. While it shares several properties of ordinary arithmetic, it will soon become clear that matrix arithmetic is different in a number of ways.

Definition 2.9 Matrix Multiplication

Let $A$ be an $m \times n$ matrix, let $B$ be an $n \times k$ matrix, and write $B = \left[ \begin{array}{cccc}
\textbf{b}_{1} & \textbf{b}_{2} & \cdots & \textbf{b}_{k}
\end{array} \right]$ where $\textbf{b}_{j}$ is column $j$ of $B$ for each $j$. The product matrix $AB$ is the $m \times k$ matrix defined as follows:
\begin{equation*}
AB = A \left[ \begin{array}{cccc} \textbf{b}_{1} & \textbf{b}_{2} & \cdots & \textbf{b}_{k} \end{array} \right] = \left[ \begin{array}{cccc} A\textbf{b}_{1} & A\textbf{b}_{2} & \cdots & A\textbf{b}_{k} \end{array} \right]
\end{equation*}

Thus the product matrix $AB$ is given in terms of its columns $A\textbf{b}_{1}, A\textbf{b}_{2}, \dots, A\textbf{b}_{n}$: Column $j$ of $AB$ is the matrix-vector product $A\textbf{b}_{j}$ of $A$ and the corresponding column $\textbf{b}_{j}$ of $B$. Note that each such product $A\textbf{b}_{j}$ makes sense by Definition 2.5 because $A$ is $m \times n$ and each $\vect{b}_{j}$ is in $\mathbf{R}^n$ (since $B$ has $n$ rows). Note also that if $B$ is a column matrix, this definition reduces to Definition 2.5 for matrix-vector multiplication.

Given matrices $A$ and $B$, Definition 2.9 and the above computation give
\begin{equation*}
A(B\vec{x}) = \left[ \begin{array}{cccc}
A\vec{b}_{1} & A\vec{b}_{2} & \cdots & A\vec{b}_{n}
\end{array} \right] \vec{x} = (AB)\vec{x}
\end{equation*}
for all $\vec{x}$ in $\mathbf{R}^k$. We record this for reference.

Theorem 2.3.1

Let $A$ be an $m \times n$ matrix and let $B$ be an $n \times k$ matrix. Then the product matrix $AB$ is $m \times k$ and satisfies
\begin{equation*}
A(B\vec{x}) = (AB)\vec{x} \quad \mbox{ for all } \vec{x} \mbox{ in } \mathbf{R}^{k}
\end{equation*}

Here is an example of how to compute the product $AB$ of two matrices using Definition 2.9.

Example 2.3.1

Compute $AB$ if $A = \left[ \begin{array}{rrr}
2 & 3 & 5 \\
1 & 4 & 7 \\
0 & 1 & 8
\end{array} \right]$
and
$B = \left[\begin{array}{rr}
8 & 9 \\
7 & 2 \\
6 & 1
\end{array} \right]$.

Solution:

The columns of $B$ are
$\vec{b}_{1} = \left[ \begin{array}{r}
8 \\
7 \\
6
\end{array} \right]$ and $\vec{b}_{2} = \left[ \begin{array}{r}
9 \\
2 \\
1
\end{array} \right]$, so Definition 2.5 gives
\begin{equation*}
A\vec{b}_{1} = \left[ \begin{array}{rrr}
2 & 3 & 5 \\
1 & 4 & 7 \\
0 & 1 & 8
\end{array} \right] \left[ \begin{array}{r}
8 \\
7 \\
6
\end{array} \right] = \left[ \begin{array}{r}
67 \\
78 \\
55
\end{array} \right] \mbox{ and } A\vec{b}_{2} = \left[ \begin{array}{rrr}
2 & 3 & 5 \\
1 & 4 & 7 \\
0 & 1 & 8
\end{array} \right] \left[ \begin{array}{r}
9 \\
2 \\
1
\end{array} \right] = \left[\begin{array}{r}
29 \\
24 \\
10
\end{array} \right]
\end{equation*}
Hence Definition 2.9 above gives $AB = \left[ \begin{array}{cc}
A\vec{b}_{1} & A\vec{b}_{2}
\end{array} \right] = \left[ \begin{array}{rr}
67 & 29 \\
78 & 24 \\
55 & 10
\end{array} \right]$.

While Definition 2.9 is important, there is another way to compute the matrix product $AB$ that gives a way to calculate each individual entry. In Section 2.2 we defined the dot product of two $n$-tuples to be the sum of the products of corresponding entries. We went on to show (Theorem 2.2.5) that if $A$ is an $m \times n$ matrix and $\vec{x}$ is an $n$-vector, then entry $j$ of the product $A\vec{x}$ is the dot product of row $j$ of $A$ with $\vec{x}$. This observation was called the ``dot product rule'' for matrix-vector multiplication, and the next theorem shows that it extends to matrix multiplication in general.

Theorem 2.3.2 Dot Product Rule

Let $A$ and $B$ be matrices of sizes $m \times n$ and $n \times k$, respectively. Then the $(i, j)$-entry of $AB$ is the dot
product of row $i$ of $A$ with column $j$ of $B$.

Proof:

Write $B = \left[ \begin{array}{cccc}
\vec{b}_{1} & \vec{b}_{2} & \cdots & \vec{b}_{n}
\end{array} \right]$ in terms of its columns. Then $A\vec{b}_{j}$ is column $j$ of $AB$ for each $j$. Hence the $(i, j)$-entry of $AB$ is entry $i$ of $A\vec{b}_{j}$, which is the dot product of row $i$ of $A$ with $\vec{b}_{j}$. This proves the theorem.

Thus to compute the $(i, j)$-entry of $AB$, proceed as follows (see the diagram):

Go across row $i$ of $A$, and down column $j$ of $B$, multiply corresponding entries, and add the results.

Note that this requires that the rows of $A$ must be the same length as the columns of $B$. The following rule is useful for remembering this and for deciding the size of the product matrix $AB$.

Compatibility Rule

Let $A$ and $B$ denote matrices. If $A$ is $m \times n$ and $B$ is $n^\prime \times k$, the product $AB$ can be formed if and only if $n=n^\prime$. In this case the size of the product matrix $AB$ is $m \times k$, and we say that $AB$ is defined, or that $A$ and $B$ are compatible for multiplication.

The diagram provides a useful mnemonic for remembering this. We adopt the following convention:

Whenever a product of matrices is written, it is tacitly assumed that the sizes of the factors are such that the product is defined.

To illustrate the dot product rule, we recompute the matrix product in Example 2.3.1.

Example 2.3.3

Compute $AB$ if $A = \left[ \begin{array}{rrr}
2 & 3 & 5 \\
1 & 4 & 7 \\
0 & 1 & 8
\end{array} \right]$
and $B = \left[ \begin{array}{rr}
8 & 9 \\
7 & 2 \\
6 & 1
\end{array} \right]$.

Solution:

Here $A$ is $3 \times 3$ and $B$ is $3 \times 2$, so the product matrix $AB$ is defined and will be of size $3 \times 2$. Theorem 2.3.2 gives each entry of $AB$ as the dot product of the corresponding row of $A$ with the corresponding column of $B_{j}$ that is,
\begin{equation*}
AB = \left[ \begin{array}{rrr}
2 & 3 & 5 \\
1 & 4 & 7 \\
0 & 1 & 8
\end{array} \right] \left[ \begin{array}{rr}
8 & 9 \\
7 & 2 \\
6 & 1
\end{array} \right]= \left[ \arraycolsep=8pt \begin{array}{cc}
2 \cdot 8 + 3 \cdot 7 + 5 \cdot 6 & 2 \cdot 9 + 3 \cdot 2 + 5 \cdot 1 \\
1 \cdot 8 + 4 \cdot 7 + 7 \cdot 6 & 1 \cdot 9 + 4 \cdot 2 + 7 \cdot 1 \\
0 \cdot 8 + 1 \cdot 7 + 8 \cdot 6 & 0 \cdot 9 + 1 \cdot 2 + 8 \cdot 1
\end{array} \right] = \left[ \begin{array}{rr}
67 & 29 \\
78 & 24 \\
55 & 10
\end{array} \right]
\end{equation*}
Of course, this agrees with Example 2.3.1.

Example 2.3.4

Compute the $(1, 3)$- and $(2, 4)$-entries of $AB$ where
\begin{equation*}
A = \left[ \begin{array}{rrr}
3 & -1 & 2 \\
0 & 1 & 4
\end{array} \right] \mbox{ and } B = \left[ \begin{array}{rrrr}
2 & 1 & 6 & 0 \\
0 & 2 & 3 & 4 \\
-1 & 0 & 5 & 8
\end{array} \right].
\end{equation*}
Then compute $AB$.

Solution:

The $(1, 3)$-entry of $AB$ is the dot product of row 1 of $A$ and column 3 of $B$ (highlighted in the following display), computed by multiplying corresponding entries and adding the results.

Similarly, the $(2, 4)$-entry of $AB$ involves row 2 of $A$ and column 4 of $B$.

Since $A$ is $2 \times 3$ and $B$ is $3 \times 4$, the product is $2 \times 4$.
\begin{equation*}
AB = \left[ \begin{array}{rrr}
3 & -1 & 2 \\
0 & 1 & 4
\end{array} \right] \left[ \begin{array}{rrrr}
2 & 1 & 6 & 0 \\
0 & 2 & 3 & 4 \\
-1 & 0 & 5 & 8
\end{array} \right] = \left[ \begin{array}{rrrr}
4 & 1 & 25 & 12 \\
-4 & 2 & 23 & 36
\end{array} \right]
\end{equation*}

Example 2.3.5

If $A = \left[ \begin{array}{ccc}
1 & 3 & 2\end{array}\right]$ and $B = \left[ \begin{array}{r}
5 \\
6 \\
4
\end{array} \right]$, compute $A^{2}$, $AB$, $BA$, and $B^{2}$ when they are defined.

Solution:

Here, $A$ is a $1 \times 3$ matrix and $B$ is a $3 \times 1$ matrix, so $A^{2}$ and $B^{2}$ are not defined. However, the compatibility rule reads
\begin{equation*}
\begin{array}{ccc}
\begin{array}{cc}
A & B\\
1 \times 3 & 3 \times 1
\end{array}
& \mbox{ and } &
\begin{array}{cc}
B & A \\
3 \times 1 & 1 \times 3
\end{array}
\end{array}
\end{equation*}
so both $AB$ and $BA$ can be formed and these are $1 \times 1$ and $3 \times 3$ matrices, respectively.

\begin{equation*}
AB = \left[ \begin{array}{rrr}
1 & 3 & 2
\end{array} \right] \left[ \begin{array}{r}
5 \\
6 \\
4
\end{array} \right] = \left[ \begin{array}{c} 1 \cdot 5 + 3 \cdot 6 + 2 \cdot 4 \end{array} \right] = \arraycolsep=1.5pt \left[ \begin{array}{c} 31 \end{array}\right]
\end{equation*}
\begin{equation*}
BA = \left[ \begin{array}{r}
5 \\
6 \\
4
\end{array} \right] \left[ \begin{array}{rrr}
1 & 3 & 2
\end{array} \right] = \left[ \begin{array}{rrr}
5 \cdot 1 & 5 \cdot 3 & 5 \cdot 2 \\
6 \cdot 1 & 6 \cdot 3 & 6 \cdot 2 \\
4 \cdot 1 & 4 \cdot 3 & 4 \cdot 2
\end{array} \right] = \left[ \begin{array}{rrr}
5 & 15 & 10 \\
6 & 18 & 12 \\
4 & 12 & 8
\end{array} \right
\end{equation*}

Unlike numerical multiplication, matrix products $AB$ and $BA$ need not be equal. In fact they need not even be the same size, as Example 2.3.5 shows. It turns out to be rare that $AB = BA$ (although it is by no means impossible), and $A$ and $B$ are said to commute when this happens.

Example 2.3.6

Let $A = \left[ \begin{array}{rr}
6 & 9 \\
-4 & -6
\end{array} \right]$ and $B = \left[ \begin{array}{rr}
1 & 2 \\
-1 & 0
\end{array} \right]$. Compute $A^{2}$, $AB$, $BA$.

Solution:
$A^{2} = \left[ \begin{array}{rr}
6 & 9 \\
-4 & -6
\end{array} \right] \left[ \begin{array}{rr}
6 & 9 \\
-4 & -6
\end{array} \right] = \left[ \begin{array}{rr}
0 & 0 \\
0 & 0
\end{array} \right]$, so $A^{2} = 0$ can occur even if $A \neq 0$. Next,
\begin{align*}
AB & = \left[ \begin{array}{rr}
6 & 9 \\
-4 & -6
\end{array} \right] \left[ \begin{array}{rr}
1 & 2 \\
-1 & 0
\end{array} \right] = \left[ \begin{array}{rr}
-3 & 12 \\
2 & -8
\end{array} \right] \\
BA & = \left[ \begin{array}{rr}
1 & 2 \\
-1 & 0
\end{array} \right] \left[ \begin{array}{rr}
6 & 9 \\
-4 & -6
\end{array} \right] = \left[ \begin{array}{rr}
-2 & -3 \\
-6 & -9
\end{array} \right]
\end{align*}
Hence $AB \neq BA$, even though $AB$ and $BA$ are the same size.

Example 2.3.7

If $A$ is any matrix, then $IA = A$ and $AI = A$, and where $I$ denotes an identity matrix of a size so that the multiplications are defined.

Solution:

These both follow from the dot product rule as the reader should verify. For a more formal proof, write $A = \left[ \begin{array}{rrrr}
\vec{a}_{1} & \vec{a}_{2} & \cdots & \vec{a}_{n}
\end{array} \right]$ where $\vec{a}_{j}$ is column $j$ of $A$. Then Definition 2.9 and Example 2.2.1 give
\begin{equation*}
IA = \left[ \begin{array}{rrrr}
I\vec{a}_{1} & I\vec{a}_{2} & \cdots & I\vec{a}_{n}
\end{array} \right] = \left[ \begin{array}{rrrr}
\vect{a}_{1} & \vect{a}_{2} & \cdots & \vect{a}_{n}
\end{array} \right] = A
\end{equation*}
If $\vec{e}_{j}$ denotes column $j$ of $I$, then $A\vec{e}_{j} = \vec{a}_{j}$ for each $j$ by Example 2.2.12. Hence Definition 2.9 gives:
\begin{equation*}
AI = A \left[ \begin{array}{rrrr}
\vec{e}_{1} & \vec{e}_{2} & \cdots & \vec{e}_{n}
\end{array} \right] = \left[ \begin{array}{rrrr}
A\vec{e}_{1} & A\vec{e}_{2} & \cdots & A\vec{e}_{n}
\end{array} \right] = \left[ \begin{array}{rrrr}
\vec{a}_{1} & \vec{a}_{2} & \cdots & \vec{a}_{n}
\end{array} \right] = A
\end{equation*}

The following theorem collects several results about matrix multiplication that are used everywhere in linear algebra.

Theorem 2.3.3

Assume that $a$ is any scalar, and that $A$, $B$, and $C$ are matrices of sizes such that the indicated matrix products are defined. Then:

1. $IA = A$ and $AI = A$ where $I$ denotes an identity matrix.

2. $A(BC) = (AB)C$.

3. $A(B + C) = AB + AC$.

4. $(B + C)A = BA + CA$.

5. $a(AB) = (aA)B = A(aB)$.

6. $(AB)^{T} = B^{T}A^{T}$.

Proof:

Condition (1) is Example 2.3.7; we prove (2), (4), and (6) and leave (3) and (5) as exercises.
1. If $C = \left[ \begin{array}{cccc}
\vec{c}_{1} & \vec{c}_{2} & \cdots & \vec{c}_{k}
\end{array} \right]$ in terms of its columns, then $BC = \left[ \begin{array}{cccc}
B\vec{c}_{1} & B\vec{c}_{2} & \cdots & B\vec{c}_{k}
\end{array} \right]$ by Definition 2.9, so
\begin{equation*}
\begin{array}{lllll}
A(BC) & = & \left[ \begin{array}{rrrr}
A(B\vec{c}_{1}) & A(B\vec{c}_{2}) & \cdots & A(B\vec{c}_{k})
\end{array} \right] & & \mbox{Definition 2.9} \\
& & & & \\
& = & \left[ \begin{array}{rrrr}
(AB)\vec{c}_{1} & (AB)\vec{c}_{2} & \cdots & (AB)\vec{c}_{k})
\end{array} \right] & & \mbox{Theorem 2.3.1} \\
& & & & \\
& = & (AB)C & & \mbox{Definition 2.9}
\end{array}
\end{equation*}

4. We know (Theorem 2.2.) that $(B + C)\vec{x} = B\vec{x} + C\vec{x}$ holds for every column $\vec{x}$. If we write $A = \left[ \begin{array}{rrrr}
\vec{a}_{1} & \vec{a}_{2} & \cdots & \vec{a}_{n}
\end{array} \right]$ in terms of its columns, we get
\begin{equation*}
\begin{array}{lllll}
(B + C)A & = & \left[ \begin{array}{rrrr}
(B + C)\vec{a}_{1} & (B + C)\vec{a}_{2} & \cdots & (B + C)\vec{a}_{n}
\end{array} \right] & & \mbox{Definition 2.9} \\
& & & & \\
& = & \left[ \begin{array}{rrrr}
B\vec{a}_{1} + C\vec{a}_{1} & B\vec{a}_{2} + C\vec{a}_{2} & \cdots & B\vec{a}_{n} + C\vec{a}_{n}
\end{array} \right] & & \mbox{Theorem 2.2.2} \\
& & & & \\
& = & \left[ \begin{array}{rrrr}
B\vec{a}_{1} & B\vec{a}_{2} & \cdots & B\vec{a}_{n}
\end{array} \right] + \left[ \begin{array}{rrrr}
C\vec{a}_{1} & C\vec{a}_{2} & \cdots & C\vec{a}_{n}
\end{array} \right] & & \mbox{Adding Columns} \\
& & & & \\
& = & BA + CA & & \mbox{Definition 2.9}
\end{array}
\end{equation*}

6. As in Section 2.1, write $A = [a_{ij}]$ and $B = [b_{ij}]$, so that $A^{T} = [a^\prime_{ij}]$ and $B^{T} = [b^\prime_{ij}]$ where $a^\prime_{ij} = a_{ji}$ and $b^\prime_{ji} = b_{ij}$ for all $i$ and $j$. If $c_{ij}$ denotes the $(i, j)$-entry of $B^{T}A^{T}$, then $c_{ij}$ is the dot product of row $i$ of $B^{T}$ with column $j$ of $A^{T}$. Hence
\begin{align*}
c_{ij} = b_{i1}^\prime a_{1j}^\prime + b_{i2}^\prime a_{2j}^\prime + \cdots + b_{im}^\prime a_{mj}^\prime &= b_{1i} a_{j1} + b_{2i} a_{j2} + \cdots + b_{mi} a_{jm} \\
&= a_{j1}b_{1i} + a_{j2}b_{2i} + \cdots + a_{jm}b_{mi}
\end{align*}
But this is the dot product of row $j$ of $A$ with column $i$ of $B$; that is, the $(j, i)$-entry of $AB$; that is, the $(i, j)$-entry of $(AB)^{T}$. This proves (6).

Property 2 in Theorem 2.3.3 is called the associative law of matrix multiplication. It asserts that the equation $A(BC) = (AB)C$ holds for all matrices (if the products are defined). Hence this product is the same no matter how it is formed, and so is written simply as $ABC$. This extends: The product $ABCD$ of four matrices can be formed several ways---for example, $(AB)(CD)$, $[A(BC)]D$, and $A[B(CD)]$---but the associative law implies that they are all equal and so are written as $ABCD$. A similar remark applies in general: Matrix products can be written unambiguously with no parentheses.

However, a note of caution about matrix multiplication must be taken: The fact that $AB$ and $BA$ need not be equal means that the order of the factors is important in a product of matrices. For example $ABCD$ and $ADCB$ may not be equal.

Warning:
If the order of the factors in a product of matrices is changed, the product matrix may change (or may not be defined). Ignoring this warning is a source of many errors by students of linear algebra!}

Properties 3 and 4 in Theorem 2.3.3 are called distributive laws. They assert that $A(B + C) = AB + AC$ and $(B + C)A = BA + CA$ hold whenever the sums and products are defined. These rules extend to more than two terms and, together with Property 5, ensure that many manipulations familiar from ordinary algebra extend to matrices. For example
\begin{align*}
A(2B - 3C + D - 5E) & = 2AB - 3AC + AD - 5AE \\
(A + 3C - 2D)B & = AB + 3CB - 2DB
\end{align*}
Note again that the warning is in effect: For example $A(B - C)$ need not equal $AB - CA$. These rules make possible a lot of simplification of matrix expressions.

Example 2.3.8

Simplify the expression $A(BC - CD) + A(C - B)D - AB(C - D)$.

Solution:

\begin{align*}
A(BC - CD) + A(C - B)D - AB(C - D) &= A(BC) - A(CD) + (AC-AB)D - (AB)C + (AB)D \\
&= ABC - ACD + ACD - ABD - ABC + ABD \\
&= 0
\end{align*}

Example 2.3.9 and Example 2.3.10 below show how we can use the properties in Theorem 2.3.2to deduce other facts about matrix multiplication. Matrices $A$ and $B$ are said to commute if $AB = BA$.

Example 2.3.9

Suppose that $A$, $B$, and $C$ are $n \times n$ matrices and that both $A$ and $B$ commute with $C$; that is, $AC = CA$ and $BC = CB$. Show that $AB$ commutes with $C$.

Solution:
Showing that $AB$ commutes with $C$ means verifying that $(AB)C = C(AB)$. The computation uses the associative law several times, as well as the given facts that $AC = CA$ and $BC = CB$.
\begin{equation*}
(AB)C = A(BC) = A(CB) = (AC)B = (CA)B = C(AB)
\end{equation*}

Example 2.3.10

Show that $AB = BA$ if and only if $(A - B)(A + B) = A^{2} - B^{2}$.

Solution:
The following always holds:
\begin{equation*} \tag{2.6}
(A - B)(A + B) = A(A + B) - B(A + B) = A^{2} + AB - BA -B^{2}
\end{equation*}
Hence if $AB = BA$, then $(A - B)(A + B) = A^{2} - B^{2}$ follows. Conversely, if this last equation holds, then equation (2.6 becomes
\begin{equation*}
A^{2} - B^{2} = A^{2} + AB - BA - B^{2}
\end{equation*}
This gives $0 = AB - BA$, and $AB = BA$ follows.

In Section 2.2 we saw (in Theorem 2.2.1 ) that every system of linear equations has the form
\begin{equation*}
A\vec{x} = \vec{b}
\end{equation*}
where $A$ is the coefficient matrix, $\vec{x}$ is the column of variables, and $\vec{b}$ is the constant matrix. Thus the system of linear equations becomes a single matrix equation. Matrix multiplication can yield information about such a system.

Example 2.3.11

Consider a system $A\vec{x} = \vec{b}$ of linear equations where $A$ is an $m \times n$ matrix. Assume that a matrix $C$ exists such that $CA = I_{n}$. If the system $A\vec{x} = \vec{b}$ has a solution, show that this solution must be $C\vec{b}$. Give a condition guaranteeing that $C\vec{b}$ is in fact a solution.

Solution:

Suppose that $\vec{x}$ is any solution to the system, so that $A\vec{x} = \vec{b}$. Multiply both sides of this matrix equation by $C$ to obtain, successively,
\begin{equation*}
C(A\vec{x}) = C\vec{b}, \quad (CA)\vec{x} = C\vec{b}, \quad I_{n}\vec{x} = C\vec{b}, \quad \vec{x} = C\vec{b}
\end{equation*}
This shows that if the system has a solution $\vec{x}$, then that solution must be $\vec{x} = C\vec{b}$, as required. But it does not guarantee that the system has a solution. However, if we write $\vec{x}_{1} = C\vec{b}$, then
\begin{equation*}
A\vec{x}_{1} = A(C\vec{b}) = (AC)\vec{b}
\end{equation*}
Thus $\vec{x}_{1} = C\vec{b}$ will be a solution if the condition $AC = I_{m}$ is satisfied.

The ideas in Example 2.3.11 lead to important information about matrices; this will be pursued in the next section.

2.4 Matrix Inverse

Three basic operations on matrices, addition, multiplication, and subtraction, are analogs for matrices of the same operations for numbers. In this section we introduce the matrix analog of numerical division.

To begin, consider how a numerical equation $ax = b$ is solved when $a$ and $b$ are known numbers. If $a = 0$, there is no solution (unless $b = 0$). But if $a \neq 0$, we can multiply both sides by the inverse $a^{-1} = \frac{1}{a}$ to obtain the solution $x = a^{-1}b$. Of course multiplying by $a^{-1}$ is just dividing by $a$, and the property of $a^{-1}$ that makes this work is that $a^{-1}a = 1$. Moreover, we saw in Section~\ref{sec:2_2} that the role that $1$ plays in arithmetic is played in matrix algebra by the identity matrix $I$. This suggests the following definition.

Definition 2.11 Matrix Inverses

If $A$ is a square matrix, a matrix $B$ is called an inverse of $A$ if and only if
\begin{equation*}
AB = I \quad \mbox{ and } \quad BA = I
\end{equation*}
A matrix $A$ that has an inverse is called an $\textbf{invertible matrix}.$

Note that only square matrices have inverses. Even though it is plausible that nonsquare matrices $A$ and $B$ could exist such that $AB = I_{m}$ and $BA = I_{n}$, where $A$ is $m \times n$ and $B$ is $n \times m$, we claim that this forces $n = m$. Indeed, if $m < n$ there exists a nonzero column $\vec{x}$ such that $A\vec{x} = \vec{0}$ (by Theorem 1.3.1), so $\vec{x} = I_{n}\vec{x} = (BA)\vec{x} = B(A\vec{x}) = B(\vec{0}) = \vec{0}$, a contradiction. Hence $m \geq n$. Similarly, the condition $AB = I_{m}$ implies that $n \geq m$. Hence $m = n$ so $A$ is square.}

Example 2.4.1

Show that $B = \left[ \begin{array}{rr}
-1 & 1 \\
1 & 0
\end{array} \right]$
is an inverse of $A = \left[ \begin{array}{rr}
0 & 1 \\
1 & 1
\end{array} \right]$.

Solution:

Compute $AB$ and $BA$.
\begin{equation*}
AB = \left[ \begin{array}{rr}
0 & 1 \\
1 & 1
\end{array} \right] \left[ \begin{array}{rr}
-1 & 1 \\
1 & 0
\end{array} \right] = \left[ \begin{array}{rr}
1 & 0 \\
0 & 1
\end{array} \right] \quad
BA = \left[ \begin{array}{rr}
-1 & 1 \\
1 & 0
\end{array} \right] \left[ \begin{array}{rr}
0 & 1 \\
1 & 1
\end{array} \right] = \left[ \begin{array}{rr}
1 & 0 \\
0 & 1
\end{array} \right]
\end{equation*}
Hence $AB = I = BA$, so $B$ is indeed an inverse of $A$.

Example 2.4.2

Show that $A = \left[ \begin{array}{rr}
0 & 0 \\
1 & 3
\end{array} \right]$
has no inverse.

Solution:
Let $B = \left[ \begin{array}{rr}
a & b \\
c & d
\end{array} \right]$
denote an arbitrary $2 \times 2$ matrix. Then
\begin{equation*}
AB = \left[ \begin{array}{rr}
0 & 0 \\
1 & 3
\end{array} \right] \left[ \begin{array}{rr}
a & b \\
c & d
\end{array} \right] = \left[ \begin{array}{cc}
0 & 0 \\
a + 3c & b + 3d
\end{array} \right]
\end{equation*}
so $AB$ has a row of zeros. Hence $AB$ cannot equal $I$ for any $B$.

The argument in Example 2.4.2 shows that no zero matrix has an inverse. But Example 2.4.2 also shows that, unlike arithmetic, it is possible for a nonzero matrix to have no inverse. However, if a matrix does have an inverse, it has only one.

Theorem 2.4.1

If $B$ and $C$ are both inverses of $A$, then $B = C$.

Proof:

Since $B$ and $C$ are both inverses of $A$, we have $CA = I = AB$. Hence
\begin{equation*}
B = IB = (CA)B = C(AB) = CI = C
\end{equation*}

If $A$ is an invertible matrix, the (unique) inverse of $A$ is denoted $A^{-1}$. Hence $A^{-1}$ (when it exists) is a square matrix of the same size as $A$ with the property that
\begin{equation*}
AA^{-1} = I \quad \mbox{ and } \quad A^{-1}A = I
\end{equation*}
These equations characterize $A^{-1}$ in the following sense:

Inverse Criterion: If somehow a matrix $B$ can be found such that $AB = I$ and $BA = I$, then $A$ is invertible and $B$ is the inverse of $A$; in symbols, $B = A^{-1}$.}

This is a way to verify that the inverse of a matrix exists. Example 2.3.3 and Example 2.3.4 offer illustrations.

Example 2.4.3

If $A = \left[ \begin{array}{rr}
0 & -1 \\
1 & -1
\end{array} \right]$, show that $A^{3} = I$ and so find $A^{-1}$.

Solution:

We have $A^{2} = \left[ \begin{array}{rr}
0 & -1 \\
1 & -1
\end{array} \right] \left[ \begin{array}{rr}
0 & -1 \\
1 & -1
\end{array} \right] = \left[ \begin{array}{rr}
-1 & 1 \\
-1 & 0
\end{array} \right]$, and so
\begin{equation*}
A^{3} = A^{2}A = \left[ \begin{array}{rr}
-1 & 1 \\
-1 & 0
\end{array} \right] \left[ \begin{array}{rr}
0 & -1 \\
1 & -1
\end{array} \right] = \left[ \begin{array}{rr}
1 & 0 \\
0 & 1
\end{array} \right] = I
\end{equation*}
Hence $A^{3} = I$, as asserted. This can be written as $A^{2}A = I = AA^{2}$, so it shows that $A^{2}$ is the inverse of $A$. That is, $A^{-1} = A^{2} = \left[ \begin{array}{rr}
-1 & 1 \\
-1 & 0
\end{array} \right]$.

The next example presents a useful formula for the inverse of a $2 \times 2$ matrix $A = \left[ \begin{array}{cc}
a & b \\
c & d
\end{array} \right]$ when it exists. To state it, we define the $\textbf{determinant}$ $\func{det }A$ and the $\textbf{adjugate}$ $\func{adj }A$ of the matrix $A$ as follows:
\begin{equation*}
\func{det}\left[ \begin{array}{cc}
a & b \\
c & d
\end{array} \right] = ad - bc, \quad \mbox{ and } \quad \func{adj} \left[ \begin{array}{cc}
a & b \\
c & d
\end{array} \right] = \left[ \begin{array}{rr}
d & -b \\
-c & a
\end{array} \right]
\end{equation*}

Example 2.4.4

If $A = \left[ \begin{array}{cc}
a & b \\
c & d
\end{array} \right]$, show that $A$ has an inverse if and only if $\func{det } A \neq 0$, and in this case
\begin{equation*}
A^{-1} = \frac{1}{\func{det } A} \func{adj } A
\end{equation*}

Solution:
For convenience, write $e = \func{det } A = ad - bc$ and
$B = \func{adj } A = \left[ \begin{array}{rr}
d & -b \\
-c & a
\end{array} \right]$. Then $AB = eI = BA$ as the reader can verify. So if $e \neq 0$, scalar multiplication by $\frac{1}{e}$ gives
\begin{equation*}
A(\frac{1}{e}B) = I = (\frac{1}{e}B)A
\end{equation*}
Hence $A$ is invertible and $A^{-1} = \frac{1}{e}B$. Thus it remains only to show that if $A^{-1}$ exists, then $e \neq 0$.

We prove this by showing that assuming $e = 0$ leads to a contradiction. In fact, if $e = 0$, then $AB = eI = 0$, so left multiplication by $A^{-1}$ gives $A^{-1}AB = A^{-1}0$; that is, $IB = 0$, so $B = 0$. But this implies that $a$, $b$, $c$, and $d$ are all zero, so $A = 0$, contrary to the assumption that $A^{-1}$ exists.

As an illustration, if $A = \left[ \begin{array}{rr}
2 & 4 \\
-3 & 8
\end{array} \right]$
then $\func{det } A = 2 \cdot 8 - 4 \cdot (-3) = 28 \neq 0$. Hence $A$ is invertible and $A^{-1} = \frac{1}{\func{det } A} \func{adj } A = \frac{1}{28} \left[ \begin{array}{rr}
8 &-4 \\
3 & 2
\end{array} \right]$, as the reader is invited to verify.

The determinant and adjugate will be defined in Chapter 3 for any square matrix, and the conclusions in Example 2.4.4 will be proved in full generality.

Inverse and Linear systems

Matrix inverses can be used to solve certain systems of linear equations. Recall that a $\textit{system}$ of linear equations can be written as a $\textit{single}$ matrix equation
\begin{equation*}
A\vec{x} = \vec{b}
\end{equation*}
where $A$ and $\vec{b}$ are known and $\vec{x}$ is to be determined. If $A$ is invertible, we multiply each side of the equation on the left by $A^{-1}$ to get
\begin{align*}
A^{-1}A\vec{x} &= A^{-1}\vec{b} \\
I\vec{x} &= A^{-1}\vec{b} \\
\vec{x} &= A^{-1}\vec{b}
\end{align*}
This gives the solution to the system of equations (the reader should verify that $\vec{x} = A^{-1}\vec{b}$ really does satisfy $A\vec{x} = \vec{b}$). Furthermore, the argument shows that if $\vec{x}$ is $\textit{any} $solution, then necessarily $\vec{x} = A^{-1}\vec{b}$, so the solution is unique. Of course the technique works only when the coefficient matrix $A$ has an inverse. This proves Theorem 2.4.2.

Theorem 2.4.2

Suppose a system of $n$ equations in $n$ variables is written in matrix form as
\begin{equation*}
A\vec{x} = \vec{b}
\end{equation*}
If the $n \times n$ coefficient matrix $A$ is invertible, the system has the unique solution
\begin{equation*}
\vec{x} = A^{-1}\vec{b}
\end{equation*}

Example 2.4.5

Use Example 2.4.4 to solve the system $\left\lbrace \arraycolsep=1pt \begin{array}{rrrrr}
5x_{1} & - & 3x_{2} & = & -4 \\
7x_{1} & + & 4x_{2} & = & 8
\end{array} \right.$.

Solution:

In matrix form this is $A\vec{x} = \vec{b}$ where $A = \left[ \begin{array}{rr}
5 & -3 \\
7 & 4
\end{array} \right]$, $\vec{x} = \left[ \begin{array}{c}
x_{1} \\
x_{2}
\end{array} \right]$, and $\vec{b} = \left[ \begin{array}{r}
-4 \\
8
\end{array} \right]$. Then $\func{det } A = 5 \cdot 4 - (-3) \cdot 7 = 41$, so $A$ is invertible and $A^{-1} = \frac{1}{41} \left[ \begin{array}{rr}
4 & 3 \\
-7 & 5
\end{array} \right]$
by Example 2.4.4. Thus Theorem 2.4.2 gives
\begin{equation*}
\vec{x} = A^{-1}\vec{b} = \frac{1}{41} \left[ \begin{array}{rr}
4 & 3 \\
-7 & 5
\end{array} \right] \left[ \begin{array}{r}
-4 \\
8
\end{array} \right] = \frac{1}{41} \left[ \begin{array}{r}
8 \\
68
\end{array} \right]
\end{equation*}
so the solution is $x_{1} = \frac{8}{41}$ and $x_{2} = \frac{68}{41}$.

An inversion method

If a matrix $A$ is $n \times n$ and invertible, it is desirable to have an efficient technique for finding the inverse. The following procedure will be justified in Section 2.5.

Matrix Inversion Algorithm

If $A$ is an invertible (square) matrix, there exists a sequence of elementary row operations that carry $A$ to the identity matrix $I$ of the same size, written $A \to I$. This same series of row operations carries $I$ to $A^{-1}$; that is, $I \to A^{-1}$. The algorithm can be summarized as follows:
\begin{equation*}
\left[ \begin{array}{cc}
A & I
\end{array} \right] \rightarrow
\left[ \begin{array}{cc}
I & A^{-1}
\end{array} \right]
\end{equation*}
where the row operations on $A$ and $I$ are carried out simultaneously.

Example 2.4.6

Use the inversion algorithm to find the inverse of the matrix
\begin{equation*}
A = \left[ \begin{array}{rrr}
2 & 7 & 1 \\
1 & 4 & -1 \\
1 & 3 & 0
\end{array} \right]
\end{equation*}

Solution:

Apply elementary row operations to the double matrix
\begin{equation*}
\left[ \begin{array}{rrr}
A & I
\end{array} \right] = \left[ \begin{array}{rrr|rrr}
2 & 7 & 1 & 1 & 0 & 0 \\
1 & 4 & -1 & 0 & 1 & 0 \\
1 & 3 & 0 & 0 & 0 & 1
\end{array} \right]
\end{equation*}
so as to carry $A$ to $I$. First interchange rows 1 and 2.
\begin{equation*}
\left[ \begin{array}{rrr|rrr}
1 & 4 & -1 & 0 & 1 & 0 \\
2 & 7 & 1 & 1 & 0 & 0 \\
1 & 3 & 0 & 0 & 0 & 1
\end{array} \right]
\end{equation*}
Next subtract $2$ times row 1 from row 2, and subtract row 1 from row 3.
\begin{equation*}
\left[ \begin{array}{rrr|rrr}
1 & 4 & -1 & 0 & 1 & 0 \\
0 & -1 & 3 & 1 & -2 & 0 \\
0 & -1 & 1 & 0 & -1 & 1
\end{array} \right]
\end{equation*}
Continue to reduced row-echelon form.
\begin{equation*}
\left[ \begin{array}{rrr|rrr}
1 & 0 & 11 & 4 & -7 & 0 \\
0 & 1 & -3 & -1 & 2 & 0 \\
0 & 0 & -2 & -1 & 1 & 1
\end{array} \right]
\end{equation*}
\begin{equation*}
\left[ \def\arraystretch{1.5} \begin{array}{rrr|rrr}
1 & 0 & 0 & \frac{-3}{2} & \frac{-3}{2} & \frac{11}{2} \\
0 & 1 & 0 & \frac{1}{2} & \frac{1}{2} & \frac{-3}{2} \\
0 & 0 & 1 & \frac{1}{2} & \frac{-1}{2} & \frac{-1}{2}
\end{array} \right]
\end{equation*}
Hence $A^{-1} = \frac{1}{2} \left[ \begin{array}{rrr}
-3 & -3 & 11 \\
1 & 1 & -3 \\
1 & -1 & -1
\end{array} \right]$, as is readily verified.

Given any $n \times n$ matrix $A$, Theorem 1.2.1 shows that $A$ can be carried by elementary row operations to a matrix $R$ in reduced row-echelon form. If $R = I$, the matrix $A$ is invertible (this will be proved in the next section), so the algorithm produces $A^{-1}$. If $R \neq I$, then $R$ has a row of zeros (it is square), so no system of linear equations $A\vecx} = \vect{b}$ can have a unique solution. But then $A$ is not invertible by Theorem 2.4.2. Hence, the algorithm is effective in the sense conveyed in Theorem 2.4.3.

Theorem 2.4.3

If $A$ is an $n \times n$ matrix, either $A$ can be reduced to $I$ by elementary row operations or it cannot. In the
first case, the algorithm produces $A^{-1}$; in the second case, $A^{-1}$ does not exist.

Properties of inverses

The following properties of an invertible matrix are used everywhere.

Example 2.4.7: Cancellation Laws

Let $A$ be an invertible matrix. Show that:

1. If $AB = AC$, then $B = C$.

2. If $BA = CA$, then $B = C$.

Solution:

Given the equation $AB = AC$, left multiply both sides by $A^{-1}$ to obtain $A^{-1}AB = A^{-1}AC$. Thus $IB = IC$, that is $B = C$. This proves (1) and the proof of (2) is left to the reader.

Properties (1) and (2) in Example 2.4.7 are described by saying that an invertible matrix can be ``left cancelled'' and ``right cancelled'', respectively. Note however that ``mixed'' cancellation does not hold in general: If $A$ is invertible and $AB = CA$, then $B$ and $C$ may $\textit{not} $ be equal, even if both are $2 \times 2$. Here is a specific example:
\begin{equation*}
A = \left[ \begin{array}{rr}
1 & 1 \\
0 & 1
\end{array} \right],\ B = \left[ \begin{array}{rr}
0 & 0 \\
1 & 2
\end{array} \right], C = \left[ \begin{array}{rr}
1 & 1 \\
1 & 1
\end{array} \right]
\end{equation*}
Sometimes the inverse of a matrix is given by a formula. Example 2.4.4 is one illustration; Example 2.4.8 and Example 2.4.9 provide two more. The idea is the $\textit{Inverse Criterion}$: If a matrix $B$ can be found such that $AB = I = BA$, then $A$ is invertible and $A^{-1} = B$.

Example 2.4.8

If $A$ is an invertible matrix, show that the transpose $A^{T}$ is also invertible. Show further that the inverse of $A^{T}$ is just the transpose of $A^{-1}$; in symbols, $(A^{T})^{-1} = (A^{-1})^{T}$.

Solution:

$A^{-1}$ exists (by assumption). Its transpose $(A^{-1})^{T}$ is the candidate proposed for the inverse of $A^{T}$. Using the inverse criterion, we test it as follows:
\begin{equation*}
\arraycolsep=1pt
\begin{array}{lllllll}
A^{T}(A^{-1})^{T} & = & (A^{-1}A)^{T} & = & I^{T} & = & I \\
(A^{-1})^{T}A^{T} & = & (AA^{-1})^{T} & = & I^{T} & = & I
\end{array}
\end{equation*}
Hence $(A^{-1})^{T}$ is indeed the inverse of $A^{T}$; that is, $(A^{T})^{-1} = (A^{-1})^{T}$.

Example 2.4.9

If $A$ and $B$ are invertible $n \times n$ matrices, show that their product $AB$ is also invertible and $(AB)^{-1} = B^{-1}A^{-1}$.

Solution:

We are given a candidate for the inverse of $AB$, namely $B^{-1}A^{-1}$. We test it as follows:
\begin{align*}
(B^{-1}A^{-1})(AB) &= B^{-1}(A^{-1}A)B = B^{-1}IB = B^{-1}B = I \\
(AB)(B^{-1}A^{-1}) &= A(BB^{-1})A^{-1} = AIA^{-1} = AA^{-1} = I
\end{align*}
Hence $B^{-1}A^{-1}$ is the inverse of $AB$; in symbols, $(AB)^{-1} = B^{-1}A^{-1}$.

We now collect several basic properties of matrix inverses for reference.

Theorem 2.4.4

All the following matrices are square matrices of the same size.

1. $I$ is invertible and $I^{-1} = I$.

2. If $A$ is invertible, so is $A^{-1}$, and $(A^{-1})^{-1} = A$.

3. If $A$ and $B$ are invertible, so is $AB$, and $(AB)^{-1} = B^{-1}A^{-1}$.

4. If $A_{1}, A_{2}, \dots, A_{k}$ are all invertible, so is their product $A_{1}A_{2} \cdots A_{k}$, and
\begin{equation*}
(A_{1}A_{2} \cdots A_{k})^{-1} = A_{k}^{-1} \cdots A_{2}^{-1}A_{1}^{-1}.
\end{equation*}

5. If $A$ is invertible, so is $A^k$ for any $k \geq 1$, and $(A^{k})^{-1} = (A^{-1})^{k}$.

6. If $A$ is invertible and $a \neq 0$ is a number, then $aA$ is invertible and $(aA)^{-1} = \frac{1}{a}A^{-1}$.

7. If $A$ is invertible, so is its transpose $A^{T}$, and $(A^{T})^{-1} = (A^{-1})^{T}$.

Proof:
1. This is an immediate consequence of the fact that $I^{2} = I$.

2. The equations $AA^{-1} = I = A^{-1}A$ show that $A$ is the inverse of $A^{-1}$; in symbols, $(A^{-1})^{-1} = A$.

3. This is Example 2.4.9.

4. Use induction on $k$. If $k = 1$, there is nothing to prove, and if $k = 2$, the result is property 3. If $k > 2$, assume inductively that $(A_1A_2 \cdots A_{k-1})^{-1} = A_{k-1}^{-1} \cdots A_2^{-1}A_1^{-1}$. We apply this fact together with property 3 as follows:
\begin{align*}
\left[ A_{1}A_{2} \cdots A_{k-1}A_{k} \right]^{-1}
&= \left[ \left(A_{1}A_{2} \cdots A_{k-1}\right)A_{k} \right]^{-1} \\
&= A_{k}^{-1}\left(A_{1}A_{2} \cdots A_{k-1}\right)^{-1} \\
&= A_{k}^{-1}\left(A_{k-1}^{-1} \cdots A_{2}^{-1}A_{1}^{-1}\right)
\end{align*}
So the proof by induction is complete.

5. This is property 4 with $A_{1} = A_{2} = \cdots = A_{k} = A$.

6. The readers are invited to verify it.

7. This is Example 2.4.8.

The reversal of the order of the inverses in properties 3 and 4 of Theorem 2.4.4 is a consequence of the fact that matrix multiplication is not
commutative. Another manifestation of this comes when matrix equations are dealt with. If a matrix equation $B = C$ is given, it can be $\textit{left-multiplied}$ by a matrix $A$ to yield $AB = AC$. Similarly, $\textit{right-multiplication}$ gives $BA = CA$. However, we cannot mix the two: If $B = C$, it need $\textit{not} $be the case that $AB = CA$ even if $A$ is invertible, for example, $A = \left[ \begin{array}{rr}
1 & 1 \\
0 & 1
\end{array} \right]$, $B = \left[ \begin{array}{rr}
0 & 0 \\
1 & 0
\end{array} \right] = C$.

Part 7 of Theorem 2.4.4 together with the fact that $(A^{T})^{T} = A$ gives

Corollary 2.4.1

A square matrix $A$ is invertible if and only if $A^{T}$ is invertible.

Example 2.4.10

Find $A$ if $(A^{T} - 2I)^{-1} = \left[ \begin{array}{rr}
2 & 1 \\
-1 & 0
\end{array} \right]$.

Solution:

By Theorem 2.4.2 (2) and Example 2.4.4, we have
\begin{equation*}
(A^{T} - 2I) = \left[ \left(A^{T} - 2I\right)^{-1} \right]^{-1} = \left[ \begin{array}{rr}
2 & 1 \\
-1 & 0
\end{array} \right]^{-1} = \left[ \begin{array}{rr}
0 & -1 \\
1 & 2
\end{array} \right]
\end{equation*}
Hence $A^{T} = 2I + \left[ \begin{array}{rr}
0 & -1 \\
1 & 2
\end{array} \right] = \left[ \begin{array}{rr}
2 & -1 \\
1 & 4
\end{array} \right]$, so $A = \left[ \begin{array}{rr}
2 & 1 \\
-1 & 4
\end{array} \right]$
by Theorem 2.4.4(7).

The following important theorem collects a number of conditions all equivalent to invertibility. It will be referred to frequently below.

Theorem 2.4.5 Inverse Theorem

The following conditions are equivalent for an $n \times n$ matrix $A$:

1. $A$ is invertible.

2. The homogeneous system $A\vec{x} = \vec{0}$ has only the trivial solution $\vec{x} = \vec{0}$.

3. $A$ can be carried to the identity matrix $I_{n}$ by elementary row operations.

4. The system $A\vec{x} = \vec{b}$ has at least one solution $\vec{x}$ for every choice of column $\vec{b}$.

5. There exists an $n \times n$ matrix $C$ such that $AC = I_{n}$.

Proof:

We show that each of these conditions implies the next, and that (5) implies (1).

(1) $\Rightarrow$ (2). If $A^{-1}$ exists, then $A\vec{x} = \vec{0}$ gives $\vec{x} = I_{n}\vec{x} = A^{-1}A\vec{x} = A^{-1}\vect{0} = \vect{0}$.

(2) $\Rightarrow$ (3). Assume that (2) is true. Certainly $A \to R$ by row operations where $R$ is a reduced, row-echelon matrix. It suffices to show that $R = I_{n}$. Suppose that this is not the case. Then $R$ has a row of zeros (being square). Now consider the augmented matrix $\left[ \begin{array}{c|c}
A & \vec{0}
\end{array} \right]$ of the system $A\vec{x} = \vec{0}$. Then $\left[ \begin{array}{c|c}
A & \vec{0}
\end{array} \right] \to \left[ \begin{array}{c|c}
R & \vec{0}
\end{array} \right]$ is the reduced form, and $\left[ \begin{array}{c|c}
R & \vec{0}
\end{array} \right]$ also has a row of zeros. Since $R$ is square there must be at least one nonleading variable, and hence at least one parameter. Hence the system $A\vec{x} = \vec{0}$ has infinitely many solutions, contrary to (2). So $R = I_{n}$ after all.

(3) $\Rightarrow$ (4). Consider the augmented matrix $\left[ \begin{array}{c|c}
A & \vec{b}
\end{array} \right]$ of the system $A\vec{x} = \vec{b}$. Using (3), let $A \to I_{n}$ by a sequence of row operations. Then these same operations carry $\left[ \begin{array}{c|c}
A & \vec{b}
\end{array} \right] \to \left[ \begin{array}{c|c}
I_{n} & \vec{c}
\end{array} \right]$ for some column $\vec{c}$. Hence the system $A\vec{x} = \vec{b}$ has a solution (in fact unique) by gaussian elimination. This proves (4).

(4) $\Rightarrow$ (5). Write $I_{n} = \left[ \begin{array}{cccc}
\vec{e}_{1} & \vec{e}_{2} & \cdots & \vec{e}_{n}
\end{array} \right]$ where $\vec{e}_{1}, \vec{e}_{2}, \dots, \vec{e}_{n}$ are the columns of $I_{n}$. For each \newline $j = 1, 2, \dots, n$, the system $A\vec{x} = \vec{e}_{j}$ has a solution $\vec{c}_{j}$ by (4), so $A\vec{c}_{j} = \vec{e}_{j}$. Now let $C = \left[ \begin{array}{cccc}
\vec{c}_{1} & \vec{c}_{2} & \cdots & \vec{c}_{n}
\end{array} \right]$ be the $n \times n$ matrix with these matrices $\vec{c}_{j}$ as its columns. Then Definition 2.9 gives (5):
\begin{equation*}
AC = A \left[ \begin{array}{cccc}
\vec{c}_{1} & \vec{c}_{2} & \cdots & \vec{c}_{n}
\end{array} \right] = \left[ \begin{array}{cccc}
A\vec{c}_{1} & A\vec{c}_{2} & \cdots & A\vec{c}_{n}
\end{array} \right] = \left[ \begin{array}{cccc}
\vec{e}_{1} & \vec{e}_{2} & \cdots & \vec{e}_{n}
\end{array} \right] = I_{n}
\end{equation*}
(5) $\Rightarrow$ (1). Assume that (5) is true so that $AC = I_{n}$ for some matrix $C$. Then $C\vec{x} = 0$ implies $\vec{x} = \vect{0}$ (because $\vec{x} = I_{n}\vec{x} = AC\vec{x} = A\vec{0} = \vec{0}$). Thus condition (2) holds for the matrix $C$ rather than $A$. Hence the argument above that (2) $\Rightarrow$ (3) $\Rightarrow$ (4) $\Rightarrow$ (5) (with $A$ replaced by $C$) shows that a matrix $C^\prime$ exists such that $CC^\prime = I_{n}$. But then
\begin{equation*}
A = AI_{n} = A(CC^\prime) = (AC)C^\prime = I_{n}C^\prime = C^\prime
\end{equation*}
Thus $CA = CC^\prime = I_{n}$ which, together with $AC = I_{n}$, shows that $C$ is the inverse of $A$. This proves (1).

The proof of (5) $\Rightarrow$ (1) in Theorem 2.4.5 shows that if $AC = I$ for square matrices, then necessarily $CA = I$, and hence that $C$ and $A$ are inverses of each other. We record this important fact for reference.

Corollary 2.4.1

If $A$ and $C$ are square matrices such that $AC = I$, then also $CA = I$. In particular, both $A$ and $C$ are invertible, $C = A^{-1}$, and $A = C^{-1}$.

Here is a quick way to remember Corollary 2.4.1. If $A$ is a square matrix, then

1. If $AC=I$ then $C=A^{-1}$.
2. If $CA=I$ then $C=A^{-1}$.

Observe that Corollary 2.4.1 is false if $A$ and $C$ are not square matrices. For example, we have
\begin{equation*}
\left[ \begin{array}{rrr}
1 & 2 & 1 \\
1 & 1 & 1
\end{array} \right] \left[ \begin{array}{rr}
-1 & 1 \\
1 & -1 \\
0 & 1
\end{array} \right] = I_{2} \quad \mbox{ but }
\left[ \begin{array}{rr}
-1 & 1 \\
1 & -1 \\
0 & 1
\end{array} \right] \left[ \begin{array}{rrr}
1 & 2 & 1 \\
1 & 1 & 1
\end{array} \right] \neq I_{3}
\end{equation*}
In fact, it can be verified that if $AB = I_{m}$ and $BA = I_{n}$, where $A$ is $m \times n$ and $B$ is $n \times m$, then $m = n$ and $A$ and $B$ are (square) inverses of each other.

An $n \times n$ matrix $A$ has $\func{rank }n$ if and only if (3) of Theorem 2.4.5 holds. Hence

Corollary 2.4.2

An $n \times n$ matrix $A$ is invertible if and only if $\func{rank }A = n$.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction

2.1 Matrix Addition, Scalar Multiplication, and Transposition

Matrix Addition

Scalar Multiplication

Transpose of a Matrix

2.2 Matrix-Vector Multiplication

Vectors

Matrix-Vector Multiplication

Linear Equations

The Dot Product

2.3 Matrix Multiplication

2.4 Matrix Inverse

Inverse and Linear systems

An inversion method

Properties of inverses

License

Share This Book