Types of Machine Learning
Karen Tao, UX Researcher
November 12, 2020
Photo by Glenn Carstens-Peters
On your journey to mastering machine learning, it is crucial to understand the main types of machine learning models. You may have heard the terms “supervised” and “unsupervised” machine learning. We will explore the differences between these two with some examples.
Supervised Machine Learning -- the algorithm learns using a labeled dataset. The ground truth is available in the dataset. For example, we have data on students who graduated with postsecondary degrees in Utah and whether they remained in the Utah workforce one year after graduation. For each student, we have features or attributes such as gender, age, major, the school attended, the type of degree obtained, GPA, etc. We also have the outcome variable of whether the student appeared in the workforce data one year after graduating. The outcome in this case would be binary—the student either participated or did not participate in the Utah workforce one year after graduation. The workforce participation is the outcome “label” in this example. This is the variable our model tries to predict. We know the ground truth of whether each student worked in Utah after graduating.
To train our model, we would first separate the students into a training set and a test set. The model is trained using data for the students in the training set along with their outcome labels. We then evaluate how the model is doing by giving the model the test set without the labels. We let the model make predictions for students in the test set. These are students the model has not previously seen. The model gets the features of the students in the test set, and makes predictions of whether the student participates in the workforce. Finally, we compare the predictions from the model to the ground truth which was withheld from the model. This gives us an idea of how well the model can predict outcomes. Our model is essentially learning the mapping function from the input, students’ features, to the output, workforce participation.
Unsupervised learning – the algorithm learns the underlying or hidden structure or distribution in the data. We do not have labels for our data points. The model separates unsorted data points into groups according to similarities, patterns, and differences. For example, say we have many documents on various topics, and we’d like to organize the documents into categories such as sports, science, finance, etc. We could use a clustering technique to find the similarities between the documents and let the model sort them into undefined categories for us. Another unsupervised model that you may have encountered is a recommendation system. Netflix may have recommended for you certain shows to binge based on how similar your past viewings are to other Netflix viewers, and what those viewers enjoyed. Amazon may have recommended items for you based on how similar your purchase history is to other Amazon shoppers and what they add to their carts. Even the fraud detection that your bank card provides is likely powered by a model that’s unsupervised. The unsupervised model works on its own to discover information and pattern.
These two main types of machine learning techniques are tools for data scientists. Knowing which one to choose requires understanding of the data that is available, the task at hand, and choosing the best algorithm for the problem we face. In addition to supervised and unsupervised learning, reinforcement learning and semi-supervised learning are gaining popularity. The best way to get familiar with the models is to implement them. For python, check out the scikit-learn library; for R, check out the caret package. Happy coding!