Probability Distributions and Random Variables

Suppose I had two coins and I flipped both of them.  The possible combinations can be two heads, two tails, or one of each.  These combinations are all part of a sample space.

Now let's take this a step further.  Taking the above demonstration, we want to determine the probability that each combination would occur.

Assuming independence, we derive the following probabilities:

  • P(\text{two heads}) = 0.25
  • P(\text{two tails}) = 0.25
  • P(\text{one of each}) = 0.5

All of these probabilities belong in a probability distribution.

Read more


No dataset this week

I just want to inform you guys that since it's my birthday, I would like to take a quick rest from analyzing a dataset.  However, I will prepare another one for you guys next week.

If you guys have any questions for me, feel free to leave a comment down below or contact me.

Happy coding!


Algorithm: Decision Trees

In my previous algorithm post, I talked about a family of algorithms called Naive Bayes.  These algorithms used Bayes' theorem, independence, and probabilities to determine whether a test case can be positively categorized.  However, these algorithms don't take into account the relationships between features.  Additionally, it would be nice to visualize how the model actually made decisions.  Fortunately, decision trees allow us to visualize the relationship of each property for classifying categories.

Read more


Dataset: Iris Flower dataset

This week's dataset will be on one of the most well known datasets used in machine learning.  Introduced in 1936 by Ronald Fisher, the iris dataset is used to test out the accuracy of machine learning algorithms.  Read more


Deriving the Naive Bayes formula

In my previous post, I introduced a class of algorithms for solving classification problems.  I also mentioned that Naive Bayes is based off of Bayes' theorem.  In this post, I will derive Naive Bayes using Bayes' theorem.

Read more


Algorithm: Naive Bayes

So far, the algorithms that I talked about consisted of modeling the data in a linear manner.  While these algorithms can be effective for simple problems, they don't suit well where there is a non-linear relationship between features and the output.  Such problems include voice, text, and image recognition, anomaly detection, game playing bots, and any problem where there is no straightforward relationship with the features.

Some non-linear algorithm classes that can solve these kind problems include neural networks, decision trees, and clustering.  These classes often have variants that suit different purposes.  In this post, I'll be talking about a different classification algorithm called Naive Bayes.

Read more


Jupyter Notebooks are now on github

I just want to let you guys know that, with the exception of my first dataset post, all past and future dataset analyses will be uploaded onto github.  I decided to do this in case people wanted to expand upon what I did.  For those who do, please credit me in the report.

As usual, I'll still be blogging about the datasets that I analyze and the lessons that I learned.  The github repository can be found here.


Dataset: Human Resources Analysis (Kaggle)

This week's dataset is on Kaggle's Human Resources Analysis.  The question that the dataset asks is:

Why are our best and most experienced employees leaving prematurely?

I then asked the following question:

How well can we predict whether an employee is going to leave?

It's definitely possible to answer the second one with great accuracy.  I used a decision tree due to the features forming a non-linear relationship. I have a picture of the tree, but it's way too big to upload onto this post.Read more


Deriving the Cost Function for Logistic Regression

In my previous post, you saw the derivative of the cost function for logistic regression as:

\frac{\partial}{\partial \theta_i} J(\theta_0,\theta_1,\ldots,\theta_n) = \frac{1}{m}\displaystyle\sum_{i=1}^{m}(g(x_i) - y_i)x_i, x_0=1

I bet several of you were thinking, "How on Earth could you derive a cost function like this:

J(\theta_1,\ldots,\theta_n) = -\frac{1}{m}\displaystyle\sum_{i=1}^{m}[(y_i)log(g(x_i)) + (1 - y_i)log(1-g(x_i))]

Into a nice function like this:

\frac{\partial}{\partial \theta_i} J(\theta_0,\theta_1,\ldots,\theta_n) = \frac{1}{m}\displaystyle\sum_{i=1}^{m}(g(x_i) - y_i)x_i?"

Well, this post is going to go through the math.  Even if you already know it, it's a good algebra and calculus problem. Read more


Algorithm: Logistic Regression

As I previously mentioned in my Linear Regression post, linear regression works best with regression type problems.  Linear regression wouldn't work well on classification problem since only numeric values on a continuous interval would be returned.  Now, you could describe a range to classify the test case when an interval is reached, but it's not very good practice to use the algorithm in this manner.

So if you can't use linear regression for classification, what kind of algorithm can be used to classify test cases?  While there are many different algorithms that can be used in classification, one of the most basic algorithms is logistic regression.

Read more