Popular Kaggle Kernels dataset

With Data Science being a very popular field that people want to get into, it's no surprise that the amount of contributions to Kaggle dramatically increased.  I recently stumbled across a dataset that gathered the most popular kernels and decided to do some exploratory data analysis on the dataset.

Read more

Kaggle's Digit Recognizer dataset

One of the hottest tech disciplines in 2017 in the tech industry was Deep Learning.  Due to Deep Learning, many startups placed AI emphasis and many frameworks have been developed to make implementing these algorithms easier.  Google's DeepMind was even able to create AlphaGo Zero that didn't rely on data to master the game of Go.  However, the analysis is much more basic than anything that was recently developed.  In fact, the dataset is the popular MNIST database dataset.  In other words, the dataset consists of hand written digits to test out computer vision.

Read more

Dataset: Paradis Bilingual Corpus

It's been a while since I last did an analysis on a dataset.  Today's dataset will focus on a corpus that deals with children learning English as a second language.  The study was done by Johanne Paradis from the University of Alberta.

Read more

Dataset: Anime Recommendations Database

This week's dataset is to determine the most recommended anime from a list of anime shows and user ratings.  To determine a list of recommended shows, I built a very primitive recommendation system based on two criteria:

Read more

Dataset: Human Resources Analysis (Kaggle)

This week's dataset is on Kaggle's Human Resources Analysis.  The question that the dataset asks is:

Why are our best and most experienced employees leaving prematurely?

I then asked the following question:

How well can we predict whether an employee is going to leave?

It's definitely possible to answer the second one with great accuracy.  I used a decision tree due to the features forming a non-linear relationship. I have a picture of the tree, but it's way too big to upload onto this post.Read more

Dataset: Los Angeles Crimes 2012-2016

This week's dataset explores crimes that occurred in Los Angeles between 2012-2016.  I had two objectives in mind when working with this dataset.  The first was observing crime patterns to see whether anything interesting popped out.  The second was getting more experience manipulating data with pandas.Read more