As I previously mentioned in my Linear Regression post, linear regression works best with regression type problems.  Linear regression wouldn’t work well on classification problem since only numeric values on a continuous interval would be returned.  Now, you could describe a range to classify the test case when an interval is reached, but it’s not very good practice to use the algorithm in this manner.

So if you can’t use linear regression for classification, what kind of algorithm can be used to classify test cases?  While there are many different algorithms that can be used in classification, one of the most basic algorithms is logistic regression.

What is Logistic Regression?

Logistic regression is a statistical model used to determine the likelihood of a response for a test case.  That is, what’s the probability that the test case can be classified for a particular group?

There are several variants that can be used, but all possible outputs are within the interval [0,1].  Additionally, we apply a threshold, often 0.5, on whether the test case is positive (1) or negative (0).  For this post, we’ll use a very common logistic function called the sigmoid function.  Below is the graph and the equation:

g(x) = \frac{1}{1+e^{-h(x)}}, h(x) = \displaystyle\sum_{i=1}^{m}(x_i\theta_i) + b

If you take note of h(x), the equation looks very similar to the one used in linear regression.  When it comes to determining the cost function for logistic regression, it’s a bit different from the one from linear regression (that’s the cost function defined in my post on Gradient Descent):

J(\theta_1,\ldots,\theta_n) = -\frac{1}{m}\displaystyle\sum_{i=1}^{m}[(y_i)log(g(x_i)) + (1 - y_i)log(1-g(x_i))]

While it looks intimidating, you’ll only work with the that one remains after applying y.  Fortunately, once you derive J, you end up getting a similar function like the one for linear regression:

\frac{\partial}{\partial \theta_i} J(\theta_0,\theta_1,\ldots,\theta_n) = \frac{1}{m}\displaystyle\sum_{i=1}^{m}(g(x_i) - y_i)x_i, x_0=1

I know, it sounds farfetched that such an ugly cost function derives quite nicely.  Yet, you just need to do some algebra and calculus and you’ll see why the derivative looks similar to the one for linear regression.

How to use Logistic Regression?

Logistic regression, like linear regression, is a supervised learning algorithm.  That is, you need to label your data in order for the algorithm to work.

So far, I know of two possible ways one can use logistic regression.  The first way is to use it alone.  In this case, using logistic regression is similar to using linear regression.  You take some data, train the model using gradient descent or some other algorithm, and then use the trained model to predict whether a result is positive or negative.

Another way to use logistic regression is to use it alongside a neural network.  While beyond the topic of this post, you would sum up the product of the input and the weights associated with each neural connection.  You then apply logistic regression and a threshold to determine whether a neuron would fire off.  If it sounds confusing, that’s okay.  I’ll dedicate the topic of neural networks to a future post.

If you need to classify multiple categories, simply apply logistic regression against each category and determine which has the highest value.  That value will determine the best category.

Pros and Cons

Logistic regression has several benefits:

  • Conceptually, it’s easy to understand .
  • It’s also easy to code up by scratch.
  • It’s very efficient when the data is not too complex.
  • It’s very fast.

However, like all algorithms, logistic regression suffers from a few drawback.  Many of them are shared with linear regression:

  • It doesn’t work well with non-linear relationships.
  • As far as I know, it only works with numeric values
  • It only works with one boundary at a time.  For multiple boundaries, more complex models, like support vector machines (SVM), would be needed.

Conclusion

Logistic regression is a fast, easy to understand model used for classifying data.  It can not only be used alone, but it can also work alongside other machine learning algorithms.  However, it cannot deal with non-linear relationships and it only deals with a single boundary.  Therefore, when your data can be easily be modeled, logistic regression is very efficient.

Have any questions, comments, or spotted an error?  Leave a comment down below.