Earlier this week, I attended the O'Reilly AI Conference up in San Jose, CA. Wednesday and Thursday started off with keynotes showcasing what companies were currently researching in the field of AI. While I'm no expert in the field, I found four key takeaways from the keynotes.
by Joseph Woolf
In my previous post, I talked about a proof of concept on developing a self-adapting web scraper. As I was adding onto the project, I was having difficulty adding constraints for improving structure accuracy. After some time, I came to one conclusion: My Initial Design Was Flawed!
Last year, I created the IssueHunt-Statistics website project on tracking repository, issues, and funding for open source projects. Shortly after, however, the website changed and my project breaks down. I did change the scraping code to bring back functionality, only for it to break down again a little while later.
I now have a problem. I don't want to always spend time constantly reworking the scraping code to make it functional. I wonder if I could automate this task?
To keep up with advances with technology, one activity that software engineers often do is contribute to Open Source. I'll be restricting this to only contributing to other existing projects, not your own projects.
However, there are some obstacles when contributing:
Some would see not contributing to Open Source as selfish. After all, you get to use free tools and you should be grateful. I honestly don't like this line of thinking. Not everyone wants to spend their entire time programming. Some projects have contributing policies that are a hassle to deal with. Some would like to do a side hustle and earn extra money.
Fortunately, there a couple websites that focus on earning money while contributing to Open Source. I ran across a few different sites:
For this post, I'll be mainly focusing on IssueHunt.
For those wanting to work with Big Data, it isn't enough to simply know a programming language and a small scale library. Once your data reaches many gigabytes, if not terabytes, in size, working with data becomes cumbersome. Your computer can only run so fast and store only so much. At this point, you would look into what kind of tooling is used for massive amounts of data. One of the tools that you would consider is called Apache Spark. In this post, we'll look at what is Spark, what can we do with Spark, and why to use Spark.
One of the recent datasets that I picked up was a Kaggle dataset called "The Interview Attendance Problem". This dataset focuses on job candidates in India attending interviews for several different companies across a few different industries. The objective is to determine whether a job candidate will be likely to show up or not.
With Data Science being a very popular field that people want to get into, it's no surprise that the amount of contributions to Kaggle dramatically increased. I recently stumbled across a dataset that gathered the most popular kernels and decided to do some exploratory data analysis on the dataset.