Supervised Learning Algorithms

Supervised Learning is like having a set of examples with all correct answers already provided.

Supervised Learning is like having a set of examples with all correct answers already provided.

Core Concept

Supervised learning trains models using labeled examples (like showing "this is a dog, this is a cat") so they can predict correct outputs for new, unseen data.

Supervised Learning: Train on labeled data (inputs with known outputs) to predict new outcomes.

  • Training Data: Examples with features (inputs) and labels (correct outputs).
  • Model: Learns the relationship between features and labels.
  • Evaluation & Generalization: Check accuracy and ability to predict unseen data.

Main Algorithms

  • Linear Regression → Predicts numbers with a best-fit line.
  • Logistic Regression → Classifies into categories using probabilities (e.g., spam/not spam).
  • Decision Trees → Uses yes/no questions to classify or predict.
  • Naive Bayes → Probability-based classification (often used for text, e.g., spam filters).
  • SVM (Support Vector Machines) → Finds the best boundary with maximum margin between classes.

Supervised Learning

By repeatedly showing data like this is dog and this is a cat the model will learn to distinguish these animals based on their characteristics, such as shape, size and color. So they are fed large datasets of labeled examples and they use this data to train the model so that it can predict the labels for new examples.

Training Data This is the labeled dataset used to train the ML model. The quality and quantity of the training data is very important for the model's accuracy.

Features Characteristics of data we server as input to the model. These datasets contain input features with corresponding output labels. Again the quality and quantity of the training data determines the accuracy of the model.

Label Labels are the know outcomes associated with each data point in training sets. They are the correct answer which the model should predict.

Model A mathematical representation of the relationship between the features and labels. Its like a function taking features as input and outputs predictions for the label

Evaluation Assessing the model's performance to determine its accuracy and its ability to generate new data.

Generalization The model's ability to accurately predict new outcomes for new unseen data.

Linear Regression

An algorithm that predicts numbers by finding the straight line that best fits the data. The algorithm draws a line through data points that minimizes errors between predicted and actual values. It assumes that when one thing changes, the other changes proportionally. For example to predicts house prizes on size, linear regression finds the best straight line showing how price increases as size increases. Once you have this line, you can predict any house's price by looking at its size.

A linear equation represents the relationship between them:

y = mx + c

Multiple Linear Regression

y = b0 + b1x1 + b2x2 + ... + bnxn

Logistic Regression

Logistic Regression is used for classification (sorting things into categories), not predicting numbers like regular regression. Predicts which of two categories something belongs to like spam/not spam, yes/no, or pass/fail. Instead of drawing a straight line, it calculates the probability (chance) that something belongs to each category. The result is always between 0% and 100%.

To predict if an email is spam, logistic regression looks at features like sender, subject line, and content. It then gives a probability like "85% chance this is spam." If the probability is over 50%, it classifies the email as spam.

It uses the mathematical function sigmoid.

P(x) = 1 / (1 + e^-z)

Decision Trees

Decision Trees are supervised learning algorithms used for classifications and regressions tasks. They create easy to understand models that make predictions by following simple decision rules learned from data features. It makes predictions by asking yes or no questions about the data leading to a final decision. So as in weather it would ask, Its it sunny? Is it Windy?.

It uses Gini Impurity to measure probabilty

Gini(S) = 1 - Σ (pi)^2

Entropy to measure disorder

Entropy(S) = - Σ pi * log2(pi)

Naive Bayes

Naive Bayes is a classification algorithm that uses probability to make predictions. It's based on Bayes' theorem, which calculates the probability of something happening based on prior knowledge and new evidence. Naive Bayes looks at words in an email and calculates. "What's the probability this email is spam given that it contains words like 'free,' 'money,' and 'click here'?"

P(A|B) = [P(B|A) * P(A)] / P(B)

Support Vector Machines (SVMs)

Algorithms for classification and regression tasks. They excel at handling complex, high-dimensional data by finding the best line or boundary to separate different classes of data. The key idea is to place this boundary so that it has the largest possible margin the space between the boundary and the closest data points (called support vectors). Bigger margin makes models more reliable.