The standard deviation of a discrete random variable is a way to measure the distance of the values of a random variable to the mean.
Let \(X\) be a discrete random variable. The standard deviation of \(X\) is
\[\sigma_X = \sqrt{E[(X - E[X])^2]}\]
The variance of a discrete random variable is the square of the standard deviation.
Let \(X\) be a discrete random variable. The variance of \(X\) is
\[\sigma_X^2 = E[(X - E[X])^2]\]
The variance is also written \(\text{Var}(X)\)
Recall the distance formula from algebra. The distance between points \((x_1, y_1)\) and \((x_2, y_2),\) the distance between the points is \[\sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2}\] Similarly, in \(3\)-dimentions, the distance between the points \((x_1, y_1, z_1)\) and \((x_2, y_2, z_2)\) is \[\sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2 + (z_1 - z_2)^2}\] This continues for any number of coordinates.
For this example, let \(X\) be a discrete random variable that has non-zero probability on the points \(\{a_1, a_2, \dots, a_n\}.\) We can make one \(n\)-dimensional of point that stores the values that \(X\) may be: \[(a_1, a_2, \dots, a_n )\] We can make another \(n\)-dimensional point that has \(E[X]\) in every coordinate: \[(E[X], E[X], \dots, E[X])\] The standard deviation of \(X\) is the distance between these two points weighted by the probabilities of \(X:\) \[\sigma_X = \sqrt{P(X = a_1)(a_1 - E[X])^2 + \dots + P(X = a_n)(a_n - E[X])^2}\]
Example using the distance formula: Let \(X\) be a random variable defined by
\[P(X = 2) = 0.8, P(X = 5) = 0.2\]
We will use the distance formula to find the standard deviation.
The first point is the set of values \(X\) can be. This is \((2, 5).\)
The second point should be the same number of coordinates as the first, \(2\) in this case, and every coordinate should be \(E[X].\) We have to find \(E[X].\)
\[E[X] = 2 \cdot 0.8 + 5 \cdot 0.2 = 1.6 + 1 = 2.6\]
So, the second point is \((2.6, 2.6).\)
Finally, we need to find the distance between \((2, 5)\) and \((2.6, 2.6),\) but we need to weight the squares in terms of how likely it is that \(X\) has those values. The geometry distance formula is
\[\sqrt{(2 - 2.6)^2 + (5 - 2.6)^2}\]
With the weights, the standard deviation is
\begin{align}
\sqrt{0.8 \cdot (2 - 2.6)^2 + 0.2 \cdot (5 - 2.6)^2} & = \sqrt{0.8 \cdot (0.6)^2 + 0.2 \cdot (2.4)^2} \\
& = \sqrt{1.44} \\
& = 1.2
\end{align}
So, \(\sigma_X = 1.2.\)
We will work with variance more often than standard deviation. There are two reasons for this.
We can use the linearity of expectation to derive an alternate variance formula, one which is often easier to compute. \begin{align} \text{Var}(X) & = E[(X - E[X])^2] \\ & = E[X^2 - 2XE[X] + E[X]^2] \\ & = E[X^2] - E[2XE[X]] + E[E[X]^2] \\ & = E[X^2] - 2E[X]E[X] + E[X]^2 \\ & = E[X^2] - E[X]^2 \end{align} So, \(\text{Var}(X) = E[X^2] - E[X]^2.\)
Example: Let's compute the standard deviation of the same random variable \(X\) that we computed using the distance formula above. In that example, we defined \(X\) by
\[P(X = 2) = 0.8, P(X = 5) = 0.2\]
By squaring both sides, we get
\[P(X^2 = 4) = 0.8, P(X^2 = 25) = 0.2\]
Now we can compute the expected values:
\begin{align}
& E[X] = 2 \cdot 0.8 + 5 \cdot 0.2 = 1.6 + 1 = 2.6 \\
& E[X^2] = 4 \cdot 0.8 + 25 \cdot 0.2 = 3.2 + 5 = 8.2
\end{align}
Last, we plug in the values into the variance formula.
\begin{align}
text{Var}(X) & = E[X^2] - E[X]^2 \\
& = 8.2 - 2.6^2 \\
& = 1.44
\end{align}
So, \(text{Var}(X) = 1.44.\)
The standard deviation of \(X\) is \(\sigma_X = \sqrt{1.44} = 1.2,\) which is what we found using the distance formula method. Both methods give the same answer.
Check your understanding:
1. A fair coin is flipped twice. Let \(X\) be the number of heads. What is \(\text{Var}(X)?\)
Unanswered
2. A random variable \(X\) has the following distribution: \[P(X = 0) = 0.1, P(X = 1) = 0.3, P(X = 2) = 0.2, P(X = 3) = 0.4\] What is the standard deviation of \(X?\)
Unanswered
3. A random variable is known to have the following:
\[E[2X^2-1] = 3, \text{Var}(X) = 1.5\]
What is \(E[X]?\)
Unanswered
4. If \(\text{Var}(X) = 121,\) which of the following must be true?
Unanswered