An Introduction to Chatbots

With the rise of Siri, Google Home, Alexa, and Cortana, it's obvious that there's a demand for chatbots.  In the past, chatbots were more of a niche technology due to limited functionality.  With recent advancements in computer technology, chatbots have now become practical for everyday use.

What is a Chatbot?

First, let’s define the term “chatbot.”  What exactly is a chatbot?

Think of it like a customer support representative.  You contact support, they ask about the problem, you describe them the problem you’re having, they ask further questions to pinpoint the problem, and eventually you get a solution.

Now, replace the person with a computer program, the program being an on-demand Q&A application.  That is a chatbot.

Types of Chatbot

Designing chatbots can be quite complex since you’re dealing with intensive computing power, immense datasets, and ambiguity of natural language.  However, we can derive two main types of chatbots. 

Rule-based Chatbot

In rule-based, a chatbot answer questions based on a series of rules.  These rules are predefined by the developer and depending on the user’s actions, would trigger other rules. 

Rule-based makes developing chatbots simpler as you only need to work in a very limited context.  However, this simplicity also prevent chatbots from getting smarter.

Take the image below as an example.

In the image, our chatbot is geared towards helping users shop on an e-commerce website.  When the user go to the chatbot, the bot will first greet the user.  Usually this is a simple "hello."

While the options aren't limited to the ones shown above, the user can ask to place items into their shopping cart, determine whether an item is in stock, and understanding the refund policy.

Once the user types in a command, the chatbot will perform various actions depending on the task.  The heavy lifting is done in the background and will notify the user once the action has completed.  The user can either follow up with additional commands or just end the conversation.

However, the chatbot cannot handle tasks outside its domain like tracking items for delivery or buying tickets for the movie theater near you.

AI-based

With AI chatbots, you utilize a machine learning model to train your chatbot to handle user input.  Oftentimes, chatbots utilize Deep Learning to derive a model.  Additionally, you can tack on voice to text recognition to provide ease of communication to the user.

Unlike rule-based, you only supply training data to the model and the model will be tailored to the dataset.  This flexibility allows the chatbot to handle complex sentences.  However, since using Deep Learning is complex, it's harder to fine-tune the model.  Additionally, utilizing it can be overkill for chatbots that work in simple environments.

Why the resurgence?

Simply put, we have three things that are going for us today:

  1. Computing power - While there were extensive theory on AI methods in the mid-20th century, the amount of computing power wasn't sufficient.  Due to Moore's Law, we have been able to quickly and dramatically speed up processing power.
  2. Huge datasets - In the past, there wasn't a lot of data to be had for training and utilizing AI models.  With the rise of the internet and complex system architectures need to handle petabytes of data, we know have access to an abundant amount of data.
  3. Resurgence of AI - In early days for AI research, people were overhyping the practicality of AI.  As a result, an AI winter occurred in the 1980s and 1990s.  During this time, there wasn't much research going on due to reduced funding.  However, with the addition of the former two points, AI became more practical for everyday products and solutions.  Whether we will hit another AI winter is up for debate, but there's definitely much research going at the moment.

Conclusion

While we haven't seen general AI, we have definitely seen more powerful chatbots taking hold in our daily lives.  In fact, you can find many blog posts on how to build your own chatbot.  It wouldn't be surprising if chatbots became very human-like in a few years.  Now, whether or not you'll talk to chatbots more than humans is a discussion for another day.


A Follow-Up on AutoWeber: The Mistakes I Made In Design

In my previous post, I talked about a proof of concept on developing a self-adapting web scraper.  As I was adding onto the project, I was having difficulty adding constraints for improving structure accuracy.  After some time, I came to one conclusion:  My Initial Design Was Flawed!

Read more


A Proof of Concept on a Self-Adapting Web Scraper

Last year, I created the IssueHunt-Statistics website project on tracking repository, issues, and funding for open source projects.  Shortly after, however, the website changed and my project breaks down.  I did change the scraping code to bring back functionality, only for it to break down again a little while later.

I now have a problem.  I don't want to always spend time constantly reworking the scraping code to make it functional.  I wonder if I could automate this task?

Read more


My Themes For 2019

We're circling around the first full week of 2019 (it's Monday for God's sake). I do admit, I'm not very proud of 2018. Here's why:

  • I've been going to Toastmasters for improving my public speaking skills. However, I only gave five speeches last year and two of them were only in December.
  • In 2017, I went to a lot more technical Meetups. In 2018, I didn't go to many Meetups.
  • From the obvious, I didn't make many contribution to my blog in 2018.

After some thinking in the last week of 2018 and first week of 2019, I've come up with two themes:

Read more


Project: IssueHunt Statistics

To keep up with advances with technology, one activity that software engineers often do is contribute to Open Source.  I'll be restricting this to only contributing to other existing projects, not your own projects.

However, there are some obstacles when contributing:

  • Since many tools used in the community are Open Source, there are very strict standards that must be followed.  Thus, the process of contributing for existing projects can be quite a headache.
  • If a project is small and the owner isn't active on a regular basis, it can be hard for your work to be merged into the project.
  • Some project communities can be toxic.  The Linux kernel community has experienced a lot of toxicity from Linus Torvalds, the Linux founder.
  • Many professional software engineers have non-competing agreements that forbid them from programming in their free times.  Those that don't have other commitments.
  • If you're not getting paid to contribute during working hours, why bother?

Some would see not contributing to Open Source as selfish.  After all, you get to use free tools and you should be grateful.  I honestly don't like this line of thinking.  Not everyone wants to spend their entire time programming.  Some projects have contributing policies that are a hassle to deal with.  Some would like to do a side hustle and earn extra money.

Fortunately, there a couple websites that focus on earning money while contributing to Open Source.  I ran across a few different sites:

  • IssueHunt - I noticed that this site mainly focuses on web projects.  If you want to contribute, I recommend having a background with Javascript and Typescript.
  • BountySource - Has a much more active user base with more variety.
  • Gitcoin - The tasks on this site focuses more on Blockchain.  You can be rewarded with Ethereum as well as cash.

For this post, I'll be mainly focusing on IssueHunt.

Read more


Natural Language Processing: Working With Human Readable Data

Most of the models in machine learning requires working with numbers.  After all, much of the machine learning algorithms we've seen are derived from statistics (Linear Regression, Logistic Regression, Naive Bayes, etc.).  Additionally, machines can understand and work with numbers a lot easier than us human.

However, machines just process the numbers and execute algorithms.  They don't interpret the numbers returned.  They don't understand the context of the data.  They especially don't understand human intricacies and can easily be taken advantage by rouge players.

So then, is it actually possible for computers to understand humans?  Can we ever have conversations with computers?  In a sense, we already can!  This is thanks to a branch of AI called Natural Language Processing.

Read more


When Your Model Is Inaccurate

Let's imagine you're doing research on an ideal rental property.  You gather your data, open up your favorite programming environment and you get to work on perform Exploratory Data Analysis (EDA).  During your EDA, you find some dirty data and clean it to train on.  You decide on a model, separate the data into training, validation, and testing, and train your model on the cleaned data.  Upon evaluating your model using some validation and test data, you notice that your validation error is very high as well as your test error.

Now suppose you pick a different model or add additional features.  Now your validation error is much lower.  Great!  However, upon using your testing data, you notice that the error is still high.  What just happened?

Read more


What are Neural Networks?

I admit, I'm late to the whole Neural Network party.  With all of the major news covering AI that use neural network as part of their implementation, you'd have to be living under a rock to not know about them.  While it's true that they can provide more flexible models compared to the other machine learning algorithms, they can be challenging to work with.

Read more


Setting up OpenCV for Java via Maven

When you learn about OpenCV, you'll often hit up on OpenCV for Python or C++, but not Java.  I can understand that OpenCV is a glorified NumPy extension for Python and OpenCV C++ is very fast.  However, it's possible that you have a legit need to use Java instead of Python or C++.

In a professional setting, Java users are likely to use Apache Maven to allow everyone to get the same version of each software without causing build and run issues.  Sure, you can always install the library and setup the CLASSPATH to point at OpenCV, but I find it better to use Maven to handle the libraries.  Just note that there is no official Maven repository for OpenCV at the time of writing, but there been others that have uploaded alternative repositories.

Repository for OpenCV 2

For those that are using OpenCV 2 and Java, you'd want to use the nu.pattern repository.  Here is the line of code needed to import OpenCV:

<!-- https://mvnrepository.com/artifact/nu.pattern/opencv -->
<dependency>
    <groupId>nu.pattern</groupId>
    <artifactId>opencv</artifactId>
    <version>2.4.9-4</version>
</dependency

Repository for OpenCV 3

For those needing to use OpenCV 3, the repository will be different.  There is no nu.pattern equivalent version for OpenCV 3.  You will need to use the following repository instead:

<!-- https://mvnrepository.com/artifact/org.openpnp/opencv -->
<dependency>
    <groupId>org.openpnp</groupId>
    <artifactId>opencv</artifactId>
    <version>3.4.2-0</version>
</dependency>

Repository for OpenCV 4

Yes.  There will be an OpenCV 4 being released soon.  As a result, don't expect one for OpenCV 4 just yet.  If you're interested in installing an early release of OpenCV 4 for Python, Adrien Rosebrock has posted some instructions for Mac OS X and Ubuntu users.

Loading the OpenCV Library

After adding the repository to your Maven file, you need to load the library for use.  Normally, you would use the line:

System.loadLibrary(Core.NATIVE_LIBRARY_NAME)

However, this method won't work as it relies on the OpenCV libraries actually being installed.  Instead, you need to do the following:

nu.pattern.OpenCV.loadShared();
nu.pattern.OpenCV.loadLocally(); // Use in case loadShared() doesn't work

Once you call one of these methods, you should be able to use OpenCV normally.  OpenCV Java is akin to OpenCV C++, so you should be able to transfer some of the knowledge over to the other programming language.


Sport Recommendation Exercise

Sports.  Sports.  Sports.

Some people love watching them.  Others love playing them.  The US love their football while those in Latin America love their soccer.  As much as we fight and bicker about which sport is the best or that our favorite team is the best, many people love sports as a pastime and follow their teams religiously.

While I'm not a sports fan, I did come across an interesting dataset from data.world that determine what was the toughest sport to pick up.  Even though this dataset is framed in an objective manner, I would like to ask a different question: Based on the sports data and a person's abilities, what sport would be optimal for them?

Read more