Probability Distributions and Random Variables

Suppose I had two coins and I flipped both of them.  The possible combinations can be two heads, two tails, or one of each.  These combinations are all part of a sample space.

Now let’s take this a step further.  Taking the above demonstration, we want to determine the probability that each combination would occur.

Assuming independence, we derive the following probabilities:

  • P(\text{two heads}) = 0.25
  • P(\text{two tails}) = 0.25
  • P(\text{one of each}) = 0.5

All of these probabilities belong in a probability distribution.

Continue Reading

Deriving the Cost Function for Logistic Regression

In my previous post, you saw the derivative of the cost function for logistic regression as:

\frac{\partial}{\partial \theta_i} J(\theta_0,\theta_1,\ldots,\theta_n) = \frac{1}{m}\displaystyle\sum_{i=1}^{m}(g(x_i) - y_i)x_i, x_0=1

I bet several of you were thinking, “How on Earth could you derive a cost function like this:

J(\theta_1,\ldots,\theta_n) = -\frac{1}{m}\displaystyle\sum_{i=1}^{m}[(y_i)log(g(x_i)) + (1 - y_i)log(1-g(x_i))]

Into a nice function like this:

\frac{\partial}{\partial \theta_i} J(\theta_0,\theta_1,\ldots,\theta_n) = \frac{1}{m}\displaystyle\sum_{i=1}^{m}(g(x_i) - y_i)x_i?”

Well, this post is going to go through the math.  Even if you already know it, it’s a good algebra and calculus problem. Continue Reading