In my post on Naive Bayes, I mentioned that there are multiple variants that can be used towards different problems.  In this post, I will be introducing another variant of Naive Bayes that utilizes the Bernoulli distribution.

What is the Bernoulli distribution?

As mentioned in my probability distribution post, the Bernoulli distribution is a special case of the Binomial distribution.  The Binomial distribution determines the likelihood that an event occurred k times within n events.  The Bernoulli distribution, on the other hand, only determines whether some event has occurred.  The probability mass function for Bernoulli is defined as follows:

P(X) = \begin{cases} p & \text{if true}\\ 1 - p & \text{if false}\\ \end{cases}

Where X is a discrete random variable and p is the probability value when true.  It’s also possible to define the function above as:

P(X) = p^k(1-p)^{1-k}

Where k \in \{0,1 \} determines whether a probability is true or false.

What is different about Bernoulli Naive Bayes?

Just like the regular and Gaussian Naive Bayes, Bernoulli Naive Bayes still utilizes Bayes theorem and the assumption of independent features to make decisions.  However, calculating the probability is determined by the probability mass function of the Bernoulli distribution.  Additionally, Bernoulli Naive Bayes is geared to solve different problems that regular and Gaussian Naive Bayes.

For example, suppose you wanted to classify a type of document.  You take several hundred documents, group all of the words in sets of separate categories, and calculate the probabilities that a word occurs in the set.  You then run Bernoulli Naive Bayes to determine whether the word exists.  The highest probability that is returned from all sets determine the type of document.

Pros and Cons

Since Bernoulli Naive Bayes is similar, they share similar properties:

  • Can be trained with little data.
  • Can be used on systems with limited resources.
  • They’re not the most accurate approach to make decisions.
  • They cannot be used with quantitative features.
  • They assume features are independent.  Whether this is an advantage or disadvantage depends on the dataset.

Despite some of the shortcomings, this kind of Naive Bayes has been used to filter spam.


Bernoulli Naive Bayes is used to determine whether an event has occurred.  Just like regular Naive Bayes, it’s very quick to train with little data.  However, it also shares several of the same disadvantages as regular Naive Bayes.  Despite it’s simplicity, Bernoulli Naive Bayes can be used as an effective document classifier.

Have any questions, comments, or spotted an error?  Leave a comment down below and I’ll get back to you ASAP.