An identity matrix is a square \(n \times n\) matrix with \(1\)'s on the diagonal and \(0\)'s everywhere else. Here are the 2, 3, and 4 dimensional identity matrices: \[ \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \] \[ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \] \[ \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \]
Identity matrices are a multiplicative identity. They are represented with the letter \(I.\) If \(A\) is any matrix with \(n\) rows, \(IA = A.\) If \(A\) is any matrix with \(n\) columns, \(AI = I.\)
A special case of the previous state is that if \(\overrightarrow{v}\) is a column vector with \(n\) rows, then \(I\overrightarrow{v} = \overrightarrow{v}.\)
In the previous lesson we used the idea of elimination and augmented matrices to solve a system of equations. In particular, we wrote a system of equations as \(A\overrightarrow{v} = \overrightarrow{b},\) then simplified the representation in an augmented matrix \([A|\overrightarrow{b}].\) Algebraic manipulation was done using three operations:
1. Multiplying a row by a scalar.
2. Adding one row to another.
3. Swapping rows.
At the end, we had an identity on the left and a solution on the right: \([I|\overrightarrow{c}.\)
Using these three operations to convert \(A\) to \(I\) is called row reduction. When the left side of the augmented matrix is reduced to \(I,\) the vector \(\overrightarrow{c}\) is the solution to the system of equations.
This only works when \(A\) is a square matrix, meaning there are the same number of equations as variables, and when it is possible to reduce \(A\) to the identity. Sometimes this is not possible. For example, this square matrix cannot be reduced to the identity: \[ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix} \]
Recall the fundamental problem: We are trying to solve \(A\overrightarrow{v} = \overrightarrow{b}.\)
In \(1\)-dimension, we have \(av = b\) and as long as \(a \neq 0\) we can divide by \(a.\) Using fancier terms, the number \(1\) is the \(1\)-dimensional identity and dividing by \(a\) is the same as multiplying by \(a^{-1},\) called the inverse of \(a.\)
An inverse is what you multiply to get the identity. So, in \(1\)-dimension, we start with \(av = b,\) multiply both sides by the inverse of \(a\) to get \(a^{-1}av = a^{-1}b,\) use \(a^{-1}a\) is the identity to get \(1v = a^{-1}b,\) and finally use the property of the identity to get \(v = a^{-1}b.\) This only works if \(a^{-1}\) exists, which it does for any \(a\) other than \(0.\)
That is a mouthful for \(1\)-dimension but we want to make it work for higher dimensions. To solve \(A\overrightarrow{v} = \overrightarrow{b},\) we need a matrix \(A^{-1}\) that has the property \(A^{-1}A = I.\) That matrix is the inverse matrix of \(A.\) If such a matrix exists, then we can multiply both sides of the fundamental problem by \(A^{-1}:\) \[\Rightarrow A^{-1}A\overrightarrow{v} = A^{-1}\overrightarrow{b}\] By definition of \(A^{-1},\) \(A^{-1}A = I.\) So we will have \[I\overrightarrow{v} = A^{-1}\overrightarrow{b}\] Since the identity times a vector is the vector, \(I\overrightarrow{v} = \overrightarrow{v}.\) Thus, we have a solution: \[\overrightarrow{v} = A^{-1}\overrightarrow{b}\]
Row reduction is a step by step process to multiplying by \(A^{-1}\) and finding \(\overrightarrow{v}.\) The 3 operations used in row reduction can be represented by matrix multiplication.
1. Multiplying a row by a scalar can be acheived by multiplying a matrix by a matrix which is almost the identity except a number takes place of a \(1\) in the row you are multiplying. For example, here is how you can use a matrix to multiply row \(2\) by \(5\): \[ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 5 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 & 2 \\ 3 & -1 & 1 \\ 4 & 0 & 6 \end{bmatrix} = \begin{bmatrix} 1 & 1 & 2 \\ 15 & -5 & 5 \\ 4 & 0 & 6 \end{bmatrix} \]
2. Adding row \(i\) to row \(j\) can be done by multiplying a matrix by another matrix which is almost the identity except there is a \(1\) in row \(j\) and column \(i.\) For example, adding row \(1\) to row \(2\) can be done as follows: \[ \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 & 2 \\ 3 & -1 & 1 \\ 4 & 0 & 6 \end{bmatrix} = \begin{bmatrix} 1 & 1 & 2 \\ 4 & 0 & 3 \\ 4 & 0 & 6 \end{bmatrix} \]
3. Swapping two rows can by done by multiplying by a matrix which is almost the identity except the rows you wish to swap are also swapped in the identity. For example, to swap rows \(2\) and \(3,\) multiply as follows: \[ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 & 2 \\ 3 & -1 & 1 \\ 4 & 0 & 6 \end{bmatrix} = \begin{bmatrix} 1 & 1 & 2 \\ 4 & 0 & 6 \\ 3 & -1 & 1 \end{bmatrix} \]
Now let's solve a simple example to see how row reduction allows us to find \(A^{-1}.\) \[ \begin{bmatrix} 1 & 1 & 0 \\ 1 & 1 & 1 \\ 1 & 2 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} 5 \\ 9 \\ 7 \end{bmatrix} \] The augmented matrix \([A|\overrightarrow{b}]\) is \[ \left[ \begin{array}{ccc|c} 1 & 1 & 0 & 5 \\ 1 & 1 & 1 & 9 \\ 1 & 2 & 0 & 7 \\ \end{array} \right] \] Step 1: Multiply the first row by \(-1\) so we can use it to cancel the other \(1\)'s. This is the same as multiplying by \[ S_1= \begin{bmatrix} -1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \] and gives us \[ [S_1A|S_1\overrightarrow{b}] \left[ \begin{array}{ccc|c} -1 & -1 & 0 & -5 \\ 1 & 1 & 1 & 9 \\ 1 & 2 & 0 & 7 \\ \end{array} \right] \]
Step 2: Add row \(1\) to row \(2.\) This is the same as multiplying by \[ S_2= \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \] and leaves \[ [S_2S_1A|S_2S_1\overrightarrow{b}]= \left[ \begin{array}{ccc|c} -1 & -1 & 0 & -5 \\ 0 & 0 & 1 & 4 \\ 1 & 2 & 0 & 7 \\ \end{array} \right] \]
Step 3: Add row \(1\) to row \(3,\) which is multiplying by \[ S_3= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \] to get \[ [S_3S_2S_1A|S_3S_2S_1\overrightarrow{b}]= \left[ \begin{array}{ccc|c} -1 & -1 & 0 & -5 \\ 0 & 0 & 1 & 4 \\ 0 & 1 & 0 & 2 \\ \end{array} \right] \]
Step 4: Swap the last two rows, which is the same as multiplying by \[ S_4= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix} \] to get \[ [S_4S_3S_2S_1A|S_4S_3S_2S_1\overrightarrow{b}]= \left[ \begin{array}{ccc|c} -1 & -1 & 0 & -5 \\ 0 & 1 & 0 & 2 \\ 0 & 0 & 1 & 4 \\ \end{array} \right] \]
Step 5: Add the second row to the first, which is the same as multiplying by \[ S_5= \begin{bmatrix} 1 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \] to get \[ [S_5S_4S_3S_2S_1A|S_5S_4S_3S_2S_1\overrightarrow{b}]= \left[ \begin{array}{ccc|c} -1 & 0 & 0 & -3 \\ 0 & 1 & 0 & 2 \\ 0 & 0 & 1 & 4 \\ \end{array} \right] \]
Step 6: Multiply the first row by \(-1,\) which is the same as multiplying by \[ S_6= \begin{bmatrix} -1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \] to get \[ [S_6S_5S_4S_3S_2S_1A|S_6S_5S_4S_3S_2S_1\overrightarrow{b}]= [I|\overrightarrow{c}]= \left[ \begin{array}{ccc|c} 1 & 0 & 0 & 3 \\ 0 & 1 & 0 & 2 \\ 0 & 0 & 1 & 4 \\ \end{array} \right] \]
The solution is \(\overrightarrow{c} = [3,2,4]\) but what is important to notice is that \(S_6S_5S_4S_3S_2S_1A = I.\) That means \(S_6S_5S_4S_3S_2S_1 = A^{-1}.\) So, we found and multiplied both sides by \(A^{-1}\) but it took 6 steps.
We can compute \(A^{-1}\) explicity by multiplying \(S_6S_5S_4S_3S_2S_1.\) \[ A^{-1}= \begin{bmatrix} 2 & 0 & -1 \\ -1 & 0 & 1 \\ -1 & 1 & 0 \end{bmatrix} \]