Probability Distributions and Random Variables

Suppose I had two coins and I flipped both of them.  The possible combinations can be two heads, two tails, or one of each.  These combinations are all part of a sample space.

Now let's take this a step further.  Taking the above demonstration, we want to determine the probability that each combination would occur.

Assuming independence, we derive the following probabilities:

  • P(\text{two heads}) = 0.25
  • P(\text{two tails}) = 0.25
  • P(\text{one of each}) = 0.5

All of these probabilities belong in a probability distribution.

Read more


Deriving the Naive Bayes formula

In my previous post, I introduced a class of algorithms for solving classification problems.  I also mentioned that Naive Bayes is based off of Bayes' theorem.  In this post, I will derive Naive Bayes using Bayes' theorem.

Read more


Deriving the Cost Function for Logistic Regression

In my previous post, you saw the derivative of the cost function for logistic regression as:

\frac{\partial}{\partial \theta_i} J(\theta_0,\theta_1,\ldots,\theta_n) = \frac{1}{m}\displaystyle\sum_{i=1}^{m}(g(x_i) - y_i)x_i, x_0=1

I bet several of you were thinking, "How on Earth could you derive a cost function like this:

J(\theta_1,\ldots,\theta_n) = -\frac{1}{m}\displaystyle\sum_{i=1}^{m}[(y_i)log(g(x_i)) + (1 - y_i)log(1-g(x_i))]

Into a nice function like this:

\frac{\partial}{\partial \theta_i} J(\theta_0,\theta_1,\ldots,\theta_n) = \frac{1}{m}\displaystyle\sum_{i=1}^{m}(g(x_i) - y_i)x_i?"

Well, this post is going to go through the math.  Even if you already know it, it's a good algebra and calculus problem. Read more