The Naive Bayes algorithm is a classification algorithm that can classify data into discrete or continuous classes. The following are examples in which you can use a Naive Bayes classifier:
A Naive Bayes classifier is applicable when
Given \(x_1, x_2, \dots, x_n,\) the probability that the class of \(d\) is \(c\) is \[P(C = c | x_1, x_2, \dots, x_n) = \frac{P(x_1, x_2, \dots, x_n|C = c)P(C = c)}{P(x_1, x_2, \dots, x_n)}\] by Bayes' Rule.
The Naive Bayes algorithm is called naive because we assume independence of \(x_1, x_2, \dots, x_n\) given \(C = c.\) By the independence assumption, we get \[\frac{P(x_1, x_2, \dots, x_n|C = c)P(C = c)}{P(x_1, x_2, \dots, x_n)} = \frac{P(x_1|C = c)P(x_2|C = c)\dots P(x_n|C = c)P(C = c)}{P(x_1, x_2, \dots, x_n)}\]
We will classify \(d\) as the mostly likely class \(c.\) So, the class is \[\text{argmax}_{c} \frac{P(x_1|C = c)P(x_2|C = c)\dots P(x_n|C = c)P(C = c)}{P(x_1, x_2, \dots, x_n)}\] Notice that the denominator does not depend on \(c,\) so dropping the denominator does not change the class. The class we assign to \(d\) is \[\hat{C} = \text{argmax}_{c} P(x_1|C = c)P(x_2|C = c)\dots P(x_n|C = c)P(C = c)\]
The values \(P(C = c)\) is a prior distribution. For example, if the application is spam filtering, then without knowing anything about a random email it is unlikely the email is spam. However, once you know words \(x_1, x_2, \dots x_n\) in the email the probability that it is spam may be more than the probability it is ham.
The algorithm works as follows:
Let \((x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)\) be the coordinates of the red points. Define \(\mu_{x,r}\) to be the mean of the \(x\)-coordinates of the red points:
\[\mu_{x,r} = \sum_{i=1}^n \frac{x_i}{n}\]
Define \(\mu_{y,r}\) to be the mean of the \(y\)-coordinates of the red points:
\[\mu_{y,r} = \sum_{i=1}^n \frac{y_i}{n}\]
Define \(\sigma_{x,r}^2\) to be the variance of the \(x\)-coordinates of the red points:
\[\sigma_{x,r}^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \mu_{x,r})^2\]
Define \(\sigma_{y,r}^2\) to be the variance of the \(y\)-coordinates of the red points:
\[\sigma_{y,r}^2 = \frac{1}{n-1}\sum_{i=1}^n (y_i - \mu_{y,r})^2\]
Similar parameters can be found for the blue points.
The mean and variance are used to make an estimate using a normal distribution. Let \(R\) be the set of red points. The probability that a given data point \(d\) is in the red class can be determined once we know the coordinates of \(d.\)
\[P(d \in R| d = (u,v)) = \frac{P(d = (u,v)| d \in R)P(d \in R)}{P(d = (u,v))}\]
Let \(B\) be the set of blue data points. Without being given any information about a random data point, we can assume the probability a point is red or blue is equal to the proportion of red or blue points, respectively. So, letting \(|R|\) be the number of red points and \(|B|\) be the number of blue points,
\begin{align}
P(d \in R) = \frac{|R|}{|R|+|B|}
P(d \in B) = \frac{|B|}{|R|+|B|}
\end{align}
Then, for a given data point \(d\) with coordinates \((u,v),\) we can predict whether its class as follows:
Some of the best mini-golf players have assembled to be ranked. All of the players will be ranked between \(0\) and \(5.\) Based on previously collected data and assumptions, the courses players may attempt have the following difficulties:
\begin{align}
& \text{Course: Difficulty} \\
& 1- 0.5 \\
& 2- 1.1 \\
& 3- 1.8 \\
& 4- 2.2 \\
& 5- 2.5 \\
& 6- 2.8 \\
& 7- 3.1 \\
& 8- 3.9 \\
& 9- 4.4 \\
& 10- 4.8
\end{align}
Players may or may not attempt all \(10\) courses.
Intuitively, a player of skill \(3\) should make par on courses \(1\) through \(6\) and should not make par on courses \(7\) through \(10.\) However, on a bad day a player of skill \(3\) may not make par on one of courses \(1\) through \(6\), or on a good day make par on courses \(7\) through \(10\). So, the model needs to allow for some leniency. If a player does not make part on course \(6\) but does make par on courses \(7,\) \(9\) and \(10,\) then there should be a high probability that the player can make par on course \(8.\) The over par score on hole \(6\) should be considered a fluke.
The probability that a player of skill class \(c\) will make par on a course of difficulty \(d\) will be given a logistic distribution (although other distributions could also work). Let \(D\) be the event that the player makes par on a course of difficulty \(D\) and let \(C\) be the player's skill class. Then
\[P(D|C = c) = \frac{1}{1+e^{d-c}}\]
Notice that as the difficulty \(d\) increases, the probability of making par decreases. Similarly, as the player's skill class \(c\) increases, the probability of making par increases.
We will assume we have no information about a random player. So, our prior assumption about the player gives equal weight to all possible classes. The pdf is \(f(c) = \frac{1}{5}\) for \(0 \leq c \leq 5.\) Since \(f(c)\) is the same for all values \(c,\) we do not need to use it to compute the argmax. Therefore, to assign a player a class we need to compute something of the form
\[\hat{C} = \text{argmax}_{c} P(x_1|C = c)P(x_2|C = c)\dots P(x_n|C = c)\]
Let \(A\) be the set of difficulty levels of holes on which a player makes par, and \(B\) be the set of difficulty levels of holes on which a player does not make par. Let's further assume that difficulty \(0\) is easy enough that any player should make par and \(5\) is hard enough that no player could make par. Using \(P(D|C = c) = \frac{1}{1+e^{d-c}}\) and \(P(D|C = c) = 1-\frac{1}{1+e^{d-c}},\) the class of the player will be
\[\hat{C} = \text{argmax}_{c} \frac{1}{1+e^{-c}}\left(1-\frac{1}{1+e^{5-c}}\right)\left[\prod_{d \in A} \frac{1}{1+e^{d-c}}\right] \left[\prod_{d \in B} \left(1-\frac{1}{1+e^{d-c}}\right)\right]\]
The assumption that every player could make par on a difficulty \(0\) course and no player could make par on a difficulty \(5\) course serves the purpose of preventing a players skill from tending to infinity or negative infinity. For example, without the assumption, if a player only participates in one hole and makes par then their score will be infinity. On the other hand, without the assumption if a player participates in one hole and does not make par, their score will be negative infinity.
You can use the white board to check which courses a player has attempted, and whether they have made par. The Naive Bayes algorithm will be used to classify the skill of the player.