Recall from my Naive Bayes post that there are several variants.  One of the variants that I’ll be talking about today is Gaussian Naive Bayes.

What is Gaussian Naive Bayes?

Gaussian Naive Bayes is similar to Naive Bayes, but instead of calculating the probabilities from a table, it uses a Normal (or Gaussian) distribution to determine the probability of an event happening.  The following probability density function is used to calculate the probability within a Normal distribution:

P(X) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}

Here, \sigma is the standard deviation of the feature and \mu is the mean (or expected value) of the feature.

The difference from regular Naive Bayes?

The regular form of Naive Bayes doesn’t utilize a probability distribution to calculate probabilities.  Instead, the user would calculate the probability that such a feature would occur for a particular class.  For example, suppose we had the following table:

Feature 1Feature 2Feature 3Class
034B
025A
124A
134A
025B
035A

Note that we will represent probabilities in the form P(Class|F1, F2, F3).  We want to know the probability of P(A|1, 3, 4).  We would start by isolating only the instances where the class was A.  Now, we would take the probabilities of each feature separately.  So,

P(1 | A) = 1/2, P(4 | A) = 1/2,P(3 | A) = 1/2

P(A)=2/3,P(1, 3, 4)=1/6

Now plugging into the formula for Naive Bayes, we get:

P(A|1, 3, 4) = \frac{1}{P(1, 3, 4)}P(1 | A)P(3 | A)P(4 | A)P(A)= \frac{1}{1/6}(1/2)(1/2)(1/2)(2/3) = 1/2

In the case of Gaussian Naive Bayes, the formula is similar, but we need to resort to the probability density of a Normal distribution to calculate each probability.

To do that, we first need to calculate the mean and variance of each feature within class A:

\mu_{feature_1}=0.5, \mu_{feature_2}=2.5, \mu_{feature_3}=4.5

\sigma_{feature_1}=0.33, \sigma_{feature_2}=0.33, \sigma_{feature_3}=0.33

Using the probability density above, we get the following values:

P(1 | A) = 0.383, P(4 | A) = 0.383, P(3 | A) = 0.383

P(A)=2/3,P(1, 3, 4)=1/6

Thus,

P(A|1, 3, 4) = \frac{1}{1/6}( 0.383)(0.383)( 0.383)(2/3) = 0.225

If you wanted to determine which features would more likely belong to a class, you would also calculate the probabilities within B and compare the result with other probabilities of each class.  The class with the highest probability would be classified as such.

Pros and Cons

Gaussian Naive Bayes shares similar advantages and disadvantages:

  • Both assume independence among features.  This can also be a drawback if the features are strongly correlated.
  • Both are quick to calculate.
  • Both can be calculated on systems with limited resources.
  • Both are not as accurate as other methods.

The difference is that Gaussian Naive Bayes is used on features that are quantitative.  The user needs to calculate the means and variances of each feature for each class before calculating the probabilities.

It might be unorthodox, but you can use both versions of Naive Bayes to classify your data.  If you take this approach, I would recommend the Gaussian version on quantitative features and the regular version on categorical features.

There is one interesting disadvantage that I found when playing with theDevMasters dataset (I showed this in my Linear Regression post):

genderageYearageMonthHeightInWeightLB
f11.9166714356.385
f12.9166715562.3105
f12.7515363.3108
f13.416671615992
f15.9166719162.5112.5
f14.2517162.5112
f15.4166718559104
f11.8333314256.569
f13.333331606294.5
f11.6666714053.868.5
f11.5833313961.5104
f14.8333317861.5103.5
f13.0833315764.5123.5
f12.4166714958.393
f11.9166714351.350.5
f12.0833314558.889
f15.9166719165.3107
f12.515059.578.5
f12.2514761.3115
f1518063.3114
f11.7514161.885
f11.6666714053.581
f13.666671645883.5
f14.6666717661.3112
f15.4166718563.3101
f13.8333316661.5103.5
f14.5833317560.893.5
f1518059112
f17.521065.5140
f12.1666714656.383.5
f14.1666717064.390
f13.51625884
f12.4166714964.3110.5
f11.5833313957.596
f15.518657.895
f16.4166719761.5121
f14.0833316962.399.5
f14.7517761.8142.5
f15.4166718565.3118
f15.1666718258.3104.5
f14.4166717362.8102.5
f13.8333316659.389.5
f1416861.595
f14.083331696298.5
f12.515061.394
f15.3333318462.3108
f11.5833313952.863.5
f12.2514759.884.5
f1214459.593.5
f14.7517761.3112
f14.8333317863.5148.5
f16.4166719764.8112
f12.1666714660109
f12.083331455991.5
f12.2514755.875
f12.0833314557.884
f12.9166715561.3107
f13.9166716762.392.5
f15.2518364.3109.5
f11.9166714355.584
f15.2518364.5102.5
f15.4166718560106
f12.3333314856.377
f12.2514758.3111.5
f12.8333315460114
f1315654.575
f1214455.873.5
f12.8333315462.893.5
f12.6666715260.5105
f15.9166719163.3113.5
f15.8333319066.8140
f11.666671406077
f12.3333314860.584.5
f15.7518964.3113.5
f11.9166714358.377.5
f14.8333317866.5117.5
f13.6666716465.398
f13.0833315760.5112
f12.2514759.5101
f12.333331485995
f14.7517761.381
f14.2517161.591
f14.3333317264.8142
f15.8333319056.898.5
f15.2518366.5112
f11.9166714361.5116.5
f14.916671796398.5
f15.51865783.5
f15.1666718265.5133
f15.166671826291.5
f11.833331425672.5
f13.7516561.3106.5
f13.7516555.567
f12.8333315461122.5
f12.515054.574
f12.9166715566144.5
f13.5833316356.584
f11.751415672.5
f12.2514751.564
f17.521062116
f14.251716384
f13.916671676193.5
f15.1666718264111.5
f121446192
f16.0833319359.8115
f11.7514161.385
f13.6666716463.3108
f15.518663.5108
f14.0833316961.585
f14.5833317560.386
f1518061.3110.5
m13.7516564.898
m13.0833315760.5105
m1214457.376.5
m12.515059.584
m12.515060.8128
m11.5833313960.587
m15.7518967128
m15.2518364.8111
m12.2514750.579
m12.1666714657.590
m13.3333316060.584
m1315661.8112
m14.4166717361.393
m12.5833315166.3117
m11.7514153.384
m12.51505999.5
m13.6666716457.895
m12.751536084
m17.1666720668.3134
m20.8333325067.5171.5
m14.6666717663.898.5
m14.6666717665118.5
m11.6666714059.594.5
m15.4166718566105
m1518061.8104
m12.1666714657.383
m15.2518366105.5
m11.6666714056.584
m12.5833315158.386
m12.583331516181
m1214462.894
m13.3333316059.378.5
m14.8333317867.3119.5
m16.0833319366.3133
m13.516264.5119
m13.6666716460.595
m15.518666112
m11.9166714357.575
m14.583331756492
m14.5833317568112
m14.5833317563.598.5
m14.4166717369112.5
m14.1666717063.8112.5
m14.517466108
m13.6666716463.5108
m1214459.588
m1315666.3106
m12.416671495792
m1214460117.5
m12.251475784
m15.6666718867.3112
m14.0833316962100
m14.3333317265112
m12.515059.584
m16.0833319367.8127.5
m13.083331575880.5
m141686093.5
m11.6666714058.586.5
m1315658.392.5
m1315661.5108.5
m13.1666715865121
m15.3333318466.5112
m1315668.5114
m121445784
m14.6666717661.581
m1416866.5111.5
m12.4166714952.581
m11.833331425570
m15.6666718871140
m16.9166720366.5117
m11.8333314258.884
m15.7518966.3112
m15.6666718865.8150.5
m16.6666720071147
m12.6666715259.5105
m14.517469.8119.5
m13.8333316662.584
m12.0833314556.591
m11.9166714357.5101
m13.5833316365.3117.5
m13.8333316667.3121
m15.1666718267133
m14.4166717366112
m12.9166715561.891.5
m13.516260105
m14.7517763111
m14.7517760.5112
m14.5833317565.5114
m13.833331666291
m12.51505998
m12.515061.8118
m15.6666718863.3115.5
m13.5833316366112
m14.2517161.8112
m13.51626391
m11.7514157.585
m14.517463112
m11.833331425687.5
m12.3333314860.5118
m11.6666714056.883.5
m13.3333316064116
m121446089
m17.1666720669.5171.5
m13.2515963.3112
m12.4166714956.372
m16.0833319372150
m16.1666719465.3134.5
m12.6666715260.897
m12.166671465571.5
m11.583331395573.5
m15.518666.5112
m13.4166716156.875
m12.7515364.8128
m16.3333319664.598
m13.666671645884
m13.2515962.899
m14.8333317863.8112
m12.7515357.879.5
m12.9166715557.380.5
m14.8333317863.5102.5
m11.833331425576
m13.6666716466.5112
m15.7518965114
m13.6666716461.5140
m13.9166716762107.5
m12.5833315159.387

When the means and variances for each feature in each class were similar, Gaussian Naive Bayes performed no better than randomly picking points with about 59% accuracy.

It’s possible that means and variances for features that are similar to other classes makes it harder for Gaussian Naive Bayes to make distinctions within the dataset.  Thus, this would lead to the algorithm making poor choices.  It can also indicate that the dataset is not suited for classification.  However, if Gaussian Naive Bayes exhibits this behavior, I would recommend trying a different classification machine learning algorithm before reconsidering your approach.

Conclusion

Gaussian Naive Bayes is just another way to calculate the probability of a feature pertaining to a class.  It’s geared towards values that are quantitative in nature and can be used to classify data.  Just like regular Naive Bayes, Gaussian Naive Bayes is ideal as a baseline for other algorithms.

Have any questions, comments, or spotted an error?  Leave a comment down below and I’ll get back to you ASAP.