### Dataset: Mushroom Data Set

May 26, 2017dataset,Naive BayesDatasets

This week's dataset is classifying the edibility of mushrooms given several attributes. I was originally going to do a comparison between Naive Bayes and decision trees on the dataset, but scikit-learn doesn't allow for string arguments when training models. Additionally, I'm not yet equipped with writing up a decision tree algorithm from scratch. Despite these setbacks, running Naive Bayes against this dataset yields very good results with 99% accuracy.

### Algorithm: Gaussian Naive Bayes

May 23, 2017algorithm,mathematics,classification,supervised,Naive Bayes,probability,Normal distributionAlgorithms,Machine Learning

Recall from my Naive Bayes post that there are several variants. One of the variants that I'll be talking about today is Gaussian Naive Bayes.

### Probability Distributions and Random Variables

May 21, 2017mathematics,Naive Bayes,probabilityMathematics

Suppose I had two coins and I flipped both of them. The possible combinations can be two heads, two tails, or one of each. These combinations are all part of a *sample space*.

Now let's take this a step further. Taking the above demonstration, we want to determine the probability that each combination would occur.

Assuming independence, we derive the following probabilities:

All of these probabilities belong in a *probability distribution*.

### No dataset this week

I just want to inform you guys that since it's my birthday, I would like to take a quick rest from analyzing a dataset. However, I will prepare another one for you guys next week.

If you guys have any questions for me, feel free to leave a comment down below or contact me.

Happy coding!

### Algorithm: Decision Trees

May 17, 2017algorithm,regression,classification,supervised,decision treeAlgorithms,Machine Learning

In my previous algorithm post, I talked about a family of algorithms called Naive Bayes. These algorithms used Bayes' theorem, independence, and probabilities to determine whether a test case can be positively categorized. However, these algorithms don't take into account the relationships between features. Additionally, it would be nice to visualize how the model actually made decisions. Fortunately, decision trees allow us to visualize the relationship of each property for classifying categories.

### Dataset: Iris Flower dataset

May 13, 2017dataset,logistic regression,Naive Bayes,decision tree,Support Vector MachineDatasets

This week's dataset will be on one of the most well known datasets used in machine learning. Introduced in 1936 by Ronald Fisher, the iris dataset is used to test out the accuracy of machine learning algorithms. Read more

### Deriving the Naive Bayes formula

May 11, 2017Naive BayesMathematics

In my previous post, I introduced a class of algorithms for solving classification problems. I also mentioned that Naive Bayes is based off of Bayes' theorem. In this post, I will derive Naive Bayes using Bayes' theorem.

### Algorithm: Naive Bayes

May 10, 2017algorithm,classification,supervised,Naive BayesAlgorithms,Machine Learning

So far, the algorithms that I talked about consisted of modeling the data in a linear manner. While these algorithms can be effective for simple problems, they don't suit well where there is a non-linear relationship between features and the output. Such problems include voice, text, and image recognition, anomaly detection, game playing bots, and any problem where there is no straightforward relationship with the features.

Some non-linear algorithm classes that can solve these kind problems include neural networks, decision trees, and clustering. These classes often have variants that suit different purposes. In this post, I'll be talking about a different classification algorithm called Naive Bayes.

### Jupyter Notebooks are now on github

I just want to let you guys know that, with the exception of my first dataset post, all past and future dataset analyses will be uploaded onto github. I decided to do this in case people wanted to expand upon what I did. For those who do, please credit me in the report.

As usual, I'll still be blogging about the datasets that I analyze and the lessons that I learned. The github repository can be found here.

### Dataset: Human Resources Analysis (Kaggle)

May 6, 2017dataset,KaggleDatasets

This week's dataset is on Kaggle's Human Resources Analysis. The question that the dataset asks is:

**Why are our best and most experienced employees leaving prematurely?**

I then asked the following question:

**How well can we predict whether an employee is going to leave?**

It's definitely possible to answer the second one with great accuracy. I used a decision tree due to the features forming a non-linear relationship. I have a picture of the tree, but it's way too big to upload onto this post.Read more