So far, I mainly discussed about classification algorithms that use probabilities to make decisions.  However, there are algorithms that don’t require the computation of probabilities.  One of the algorithms that do this is called a support vector machine.

What is a Support Vector Machine (SVM)?

A support vector machine is a supervised algorithm that doesn’t use probabilities to make decisions.  Rather, it attempts to model the data by creating a hyperplane, with the largest margin, to separate the data.  In other words, it tries to determine the best way to set boundaries for categorizing data.

To demonstrate this, suppose we have the following graph with three lines and two categories:

By User:ZackWeinberg, based on PNG version by User:Cyc – This file was derived from: Svm separating hyperplanes.png, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=22877598

The green line (H_1) fails to separate the categories.  This would not be a good solution for classifying data.  The blue line (H_2) is a bit better since we now can classify the data.  However, the margin is too small and it can cause many incorrect predictions once new data is introduced.  The red line (H_3) would be the best since now there is enough margin for new data to be predicted correctly.

How does Support Vector Machines work?

As I mentioned earlier, SVMs don’t compute probabilities to categorize data.  Rather, they attempt to solve the convex optimization problem through analysis.  As a brief introduction, the convex optimization problem attempts to optimally minimize a convex function.  Due to the sheer complexity of the concept, the problem is beyond the scope of this post.  However, because this is an important problem in Computer Science, I would like to cover it in the future.

Another aspect of SVMs is the utilization of kernels.  Kernels can be thought of functions that can change the behavior of generating a margin.  For data that can’t be separated easily, they can better separate the data.  Due to the possible variety, kernels are also beyond the scope of this scope and will be talked about in the future.

There are several algorithms that can be used to train SVMs.  However, the concepts can be very difficult for those not familiar with mathematical optimizations.  Fortunately for those just wanting to get something running, there are libraries that have implemented SVMs.

Pros and Cons

There are several advantages to using SVMs:

  • They work well with many features.  There is a limit to this advantage, though, in proportion to the number of samples.
  • Their behavior can be modified with the use of kernels, thus allowing them to tackle many different problems.
  • Can be memory efficient since only a certain amount of data is needed.

However, SVM does suffer from a few problems:

  • If there are way too many features, SVMs will not perform well.  However, the kernel used and the training algorithm will also affect the performance.
  • Since SVMs reduce (meaning to be equivalent to solving) to the convex optimization problem, training can be slow.  In fact, depending on the type of problem, the complexity of the convex optimization problem is NP.
  • The theory behind SVMs is very complex and cannot be not easily implemented by scratch.

Conclusions

A support vector machine takes a different approach in classification by optimizing the margin that separates the data.  Their flexibility with kernels allows them to solve different types of problems.  For those wanting to implement SVMs by scratch, there are a lot of complex concepts that needs to be understood.  Fortunately, those just wanting to use them for their applications can simply use a machine learning library.  Yet, implementing an optimal SVM will require an understanding of the underlying concepts.

Despite the learning curve, a support vector machine provides a flexible and accurate way to classify data.

Have any questions, comments, or spotted an error?  Leave a comment down below and I’ll get back to you ASAP.