Dataset: Iris Flower dataset

This week's dataset will be on one of the most well known datasets used in machine learning.  Introduced in 1936 by Ronald Fisher, the iris dataset is used to test out the accuracy of machine learning algorithms.  Read more


Deriving the Cost Function for Logistic Regression

In my previous post, you saw the derivative of the cost function for logistic regression as:

\frac{\partial}{\partial \theta_i} J(\theta_0,\theta_1,\ldots,\theta_n) = \frac{1}{m}\displaystyle\sum_{i=1}^{m}(g(x_i) - y_i)x_i, x_0=1

I bet several of you were thinking, "How on Earth could you derive a cost function like this:

J(\theta_1,\ldots,\theta_n) = -\frac{1}{m}\displaystyle\sum_{i=1}^{m}[(y_i)log(g(x_i)) + (1 - y_i)log(1-g(x_i))]

Into a nice function like this:

\frac{\partial}{\partial \theta_i} J(\theta_0,\theta_1,\ldots,\theta_n) = \frac{1}{m}\displaystyle\sum_{i=1}^{m}(g(x_i) - y_i)x_i?"

Well, this post is going to go through the math.  Even if you already know it, it's a good algebra and calculus problem. Read more


Algorithm: Logistic Regression

As I previously mentioned in my Linear Regression post, linear regression works best with regression type problems.  Linear regression wouldn't work well on classification problem since only numeric values on a continuous interval would be returned.  Now, you could describe a range to classify the test case when an interval is reached, but it's not very good practice to use the algorithm in this manner.

So if you can't use linear regression for classification, what kind of algorithm can be used to classify test cases?  While there are many different algorithms that can be used in classification, one of the most basic algorithms is logistic regression.

Read more