Post

Chapter 2. Supervised Learning

What is Supervised Learning?

Supervised Learning is a form of machine learning that understands the relatioinship between provided input and output pairs. The objective of this method is to predict new, unseen data accurately. To create its training set, human involvement is required to establish pairs of input and output.

2.1 Classification and Regression

Supervised Learning comes in two main types: Classification and Regression. The aim of classification is to predict a category from a list of options. Depending on the number of classes, classification can be split into binary (distinguishing between exactly two classes) and multiclass (distinguishing more than two classes). On the other hand, regression aims to predict a continuous or floating-point number in programming terms. A simple way to choose between these types is to consider whether the output is continuous. If there is a smooth flow between possible outcomes, it’s a regression problem. For instance, predicting a person’s annual income involves a continuous outcome. In this case, if we predict $40,001 instead of the correct $40,000, it’s not a big issue. However, when predicting the language of written text, there is no continuity between English and German.

2.2 Generalization, Overfitting, Underfitting

Generalization refers to a model’s ability to adjust effectively to new data that it hasn’t seen before, which comes from the same source as the data used to build the model. Underfitting happens when the model doesn't perform well on the training data, indicating its struggle to grasp the connection between input examples (often called X) and target values (often called Y). This usually occurs when the model is too simple and requires less regularization. On the other hand, overfitting occurs when the model does well on the training data but fails on the evaluation data. This indicates that the model is memorizing the seen data and struggles to apply its knowledge to new examples. Overfitting usually happens when the model is too complex and needs more regularization.

Regulation? (Mainly refereed by this Medium Post) https://towardsdatascience.com/regularization-in-machine-learning-76441ddcf99a

2.2.1 Relation of Model Complexity to Dataset Size

2.3 Supervised Machine Learning Algorithms

2.3.1 K-Nearest Neighbors

K-Neighbors classification