Sports.  Sports.  Sports.

Some people love watching them.  Others love playing them.  The US love their football while those in Latin America love their soccer.  As much as we fight and bicker about which sport is the best or that our favorite team is the best, many people love sports as a pastime and follow their teams religiously.

While I’m not a sports fan, I did come across an interesting dataset from that determine what was the toughest sport to pick up.  Even though this dataset is framed in an objective manner, I would like to ask a different question: Based on the sports data and a person’s abilities, what sport would be optimal for them?

Dataset Info

The original dataset is from an old webpage on the ESPN website.  The data was determined by a panel of eight experts.  From the experts, it was determined that Boxing was the toughest sport to pick up while fishing was the easiest.  To quantify their findings, they used the following attributes:

  • Endurance: The ability to continue to perform a skill or action for long periods of time.
  • Strength: The ability to produce force.
  • Power: The ability to produce strength in the shortest possible time.
  • Speed: The ability to move quickly.
  • Agility: The ability to change direction quickly.
  • Flexibility: The ability to stretch the joints across a large range of motion.
  • Nerve: The ability to overcome fear.
  • Durability: The ability to withstand physical punishment over a long period of time.
  • Hand-Eye Coordination: The ability to react quickly to sensory perception.
  • Analytic Aptitude: The ability to evaluate and react appropriately to strategic situations.

The Recommendation Mechanism

The point of recommendations is to introduce the end-user to new products.  In our case, we want to recommend the sport that would best suited to their abilities.  Since our use case is simple, we’ll use a statistical technique known as mean squared error.  We define the formula as follows:

\text{MSE}(i) = (h(x^i) - y^i)^2

The use of a mean squared error is to determine the amount of error between an expected value and the actual value that is used in the prediction.  Mean squared works by squaring the difference between the expected value and the actual value.

With this technique, we can measure the amount of error between our user’s abilities and the difficulty of each sport.

Shortcoming in Dataset

As with all things, the dataset is not perfect.  The dataset used assumes that biological attributes are factored into the sport ratings.  For example, the height of a person factors into how fast they can swim.  While we are using the dataset for a different purpose, this flaw can influence how our program can recommend sports.  As a result, it’s possible that a person’s biological makeup can recommend the wrong sport.


To be fair, using this dataset to recommend a sport for you is rather ridiculous.  Kids just try out a sport and determine the right fit for them.  Nevertheless, this exercise was done to provide a simple way to recommend things to users.

The sport recommendation program can be found on my Github.