July 24, 2017dataset,Kaggle,NLPDatasets
It's been a while since I last did an analysis on a dataset. Today's dataset will focus on a corpus that deals with children learning English as a second language. The study was done by Johanne Paradis from the University of Alberta.
by Joseph Woolf
July 11, 2017algorithm,Cluster Analysis,clustering,k-meansAlgorithms,Machine Learning
In my previous post, I mentioned how there are many different algorithms that can be used to cluster a dataset. One of the most popular clustering methods used is called the k-means clustering algorithm.
June 27, 2017algorithm,classification,Cluster Analysis,clustering,unsupervisedAlgorithms,Machine Learning
If you were to go online and start shopping, chances are you're getting plowed by many suggestions from online sites. However, these suggestions aren't random, but rather based on what you recently browsed and purchased. How did they determine what to recommend and what to ignore?
The system described above is called a recommendation system. The actual implementation, though, is through the use of a method called clustering. Clustering, in itself, is part of Cluster Analysis.
June 20, 2017algorithm,classification,Naive Bayes,BernoulliAlgorithms,Machine Learning
In my post on Naive Bayes, I mentioned that there are multiple variants that can be used towards different problems. In this post, I will be introducing another variant of Naive Bayes that utilizes the Bernoulli distribution.
June 13, 2017algorithm,classification,supervised,Support Vector MachineAlgorithms,Machine Learning
So far, I mainly discussed about classification algorithms that use probabilities to make decisions. However, there are algorithms that don't require the computation of probabilities. One of the algorithms that do this is called a support vector machine.
June 10, 2017dataset,association rule learning,aprioriDatasets
In this week's dataset, I worked with the Belgium retail market dataset. In my previous post, I talked about how Apriori can be used to generate association rules. So, I search for a good dataset that I can use to apply the Apriori algorithm. The dataset consists of over 88,000 transactions with over 16,000 different items. While the dataset only contains numbers, we can still apply the algorithm. This analysis demonstrates how support and confidence influences the amount of rules generated.
June 7, 2017algorithm,association rule learning,aprioriAlgorithms,Machine Learning
So far, I've talked about regression or classification algorithms that can be used to solve problems. Sometimes though, we just want to discover some associations within our data. These associations can, in turn, be used by a business to optimize profits.
One of the fundamental algorithms that can be used to solve these kind of problems is called Apriori algorithm.
June 2, 2017dataset,Kaggle,recommendation systemDatasets
This week's dataset is to determine the most recommended anime from a list of anime shows and user ratings. To determine a list of recommended shows, I built a very primitive recommendation system based on two criteria:
May 30, 2017algorithm,classification,supervised,decision tree,ID3Algorithms,Machine Learning
In my decision tree post, I mentioned several different types of algorithms that can be used to create a decision tree. Today, I'll be talking about a decision tree called the Iterative Dichotomiser 3 (ID3) algorithm.
May 26, 2017dataset,Naive BayesDatasets
This week's dataset is classifying the edibility of mushrooms given several attributes. I was originally going to do a comparison between Naive Bayes and decision trees on the dataset, but scikit-learn doesn't allow for string arguments when training models. Additionally, I'm not yet equipped with writing up a decision tree algorithm from scratch. Despite these setbacks, running Naive Bayes against this dataset yields very good results with 99% accuracy.