### Algorithm: Bernoulli Naive Bayes

In my post on Naive Bayes, I mentioned that there are multiple variants that can be used towards different problems. In this post, I will be introducing another variant of Naive Bayes that utilizes the Bernoulli distribution.

### Dataset: Mushroom Data Set

This week's dataset is classifying the edibility of mushrooms given several attributes. I was originally going to do a comparison between Naive Bayes and decision trees on the dataset, but scikit-learn doesn't allow for string arguments when training models. Additionally, I'm not yet equipped with writing up a decision tree algorithm from scratch. Despite these setbacks, running Naive Bayes against this dataset yields very good results with 99% accuracy.

### Algorithm: Gaussian Naive Bayes

Recall from my Naive Bayes post that there are several variants. One of the variants that I'll be talking about today is Gaussian Naive Bayes.

### Probability Distributions and Random Variables

Suppose I had two coins and I flipped both of them. The possible combinations can be two heads, two tails, or one of each. These combinations are all part of a *sample space*.

Now let's take this a step further. Taking the above demonstration, we want to determine the probability that each combination would occur.

Assuming independence, we derive the following probabilities:

All of these probabilities belong in a *probability distribution*.

### Dataset: Iris Flower dataset

This week's dataset will be on one of the most well known datasets used in machine learning. Introduced in 1936 by Ronald Fisher, the iris dataset is used to test out the accuracy of machine learning algorithms. Read more

### Deriving the Naive Bayes formula

In my previous post, I introduced a class of algorithms for solving classification problems. I also mentioned that Naive Bayes is based off of Bayes' theorem. In this post, I will derive Naive Bayes using Bayes' theorem.

### Algorithm: Naive Bayes

So far, the algorithms that I talked about consisted of modeling the data in a linear manner. While these algorithms can be effective for simple problems, they don't suit well where there is a non-linear relationship between features and the output. Such problems include voice, text, and image recognition, anomaly detection, game playing bots, and any problem where there is no straightforward relationship with the features.

Some non-linear algorithm classes that can solve these kind problems include neural networks, decision trees, and clustering. These classes often have variants that suit different purposes. In this post, I'll be talking about a different classification algorithm called Naive Bayes.