Math Concepts for Supervised and Unsupervised Machine Learning
- Joy Tech

- Mar 18, 2023
- 2 min read
Updated: Mar 20, 2023
Supervised Learning
I. Introduction
Definition of machine learning: Probability theory, linear algebra, calculus
Types of machine learning: Supervised, unsupervised, reinforcement
Applications of machine learning: Natural language processing, computer vision, fraud detection, recommendation systems
II. Supervised Learning A. Basics of Supervised Learning
Types of problems:
Classification: Bayes' theorem, decision boundaries, softmax function, cross-entropy loss
Regression: Linear algebra (matrices, vectors), calculus (derivatives, partial derivatives), loss functions (mean squared error, mean absolute error)
Learning process:
Training: Optimization algorithms (gradient descent, stochastic gradient descent, Adam), backpropagation (chain rule of derivatives)
Testing: Prediction, inference
Evaluation metrics:
Classification: Confusion matrix, accuracy, precision, recall, F1 score, ROC curve, AUC
Regression: Mean squared error, mean absolute error, R-squared, explained variance
B. Linear Models
Linear Regression:
Optimization algorithms: Normal equation, gradient descent
Linear algebra: Matrix multiplication, inverse, transpose
Calculus: Partial derivatives, gradients
Logistic Regression:
Optimization algorithms: Gradient descent, Newton's method
Bayes' theorem: Probability theory, conditional probabilities
Linear algebra: Matrix multiplication
Naive Bayes Classifier:
Bayes' theorem: Probability theory, conditional probabilities
Probability theory: Joint probabilities, marginal probabilities
Support Vector Machines:
Optimization algorithms: Quadratic programming, dual problem, kernel methods
Calculus: Lagrange multipliers, partial derivatives
Linear algebra: Inner product, norms, Gram matrix
C. Tree-Based Models
Decision Trees:
Entropy: Information theory, probability theory
Information gain: Entropy, conditional probabilities
Tree traversal: Depth-first search, breadth-first search
Random Forests:
Ensemble learning: Bagging, bootstrap sampling
Decision trees: Gini impurity, information gain
Gradient Boosted Trees:
Gradient descent: Optimization algorithm, partial derivatives
Decision trees: Regression trees, loss functions (mean squared error, mean absolute error)
D. Instance-Based Models
k-Nearest Neighbors:
Distance metrics: Euclidean distance, Manhattan distance, cosine similarity
Voronoi diagrams: Geometry, Delaunay triangulation
E. Deep Learning Models
Artificial Neural Networks:
Perceptron learning rule: Activation functions, linear combinations
Backpropagation: Chain rule of derivatives, gradients
Activation functions: Sigmoid, ReLU, softmax
Convolutional Neural Networks:
Convolutional layers: Convolution operation, feature maps, stride, padding
Pooling layers: Max pooling, average pooling
ReLU activation: Nonlinearity, rectification
Recurrent Neural Networks:
Backpropagation through time: Unfolding, gradients
LSTM units: Memory cells, input/output/forget gates, activation functions (sigmoid, tanh)
Unsupervised Learning
A. Basics of Unsupervised Learning
Probability distributions and density estimation
Clustering methods: k-means, hierarchical clustering, density-based clustering
Dimensionality reduction techniques: principal component analysis (PCA), independent component analysis (ICA), non-negative matrix factorization (NMF), t-distributed stochastic neighbor embedding (t-SNE)
Information theory: entropy, mutual information, KL divergence
Optimization: gradient descent, stochastic gradient descent
B. Clustering Algorithms
Distance measures: Euclidean distance, Manhattan distance, cosine similarity
Objective functions: inertia, silhouette score
Optimization: Lloyd's algorithm for k-means, agglomerative hierarchical clustering
C. Dimensionality Reduction Algorithms
Linear algebra: eigenvectors, eigenvalues, singular value decomposition (SVD)
Optimization: gradient descent, stochastic gradient descent, Adam optimization
Information theory: entropy, mutual information, KL divergence
IV. Reinforcement Learning A. Basics of Reinforcement Learning
Probability theory: Markov decision processes, stochastic policies, state transition probabilities
Bellman equation: state-value function, action-value function, optimal policy
Exploration vs. exploitation trade-off
B. Algorithms
Q-Learning: Bellman equation, off-policy learning, epsilon-greedy policy
Deep Q-Networks: experience replay, target network, neural network approximators
Policy gradient methods: policy gradient theorem, REINFORCE algorithm
V. Advanced Topics A. Hyperparameter Tuning
Optimization: grid search, random search, Bayesian optimization
Cross-validation, overfitting
B. Regularization
L1 regularization, L2 regularization, Elastic Net regularization
Ridge regression, Lasso regression
C. Gradient Descent
Batch gradient descent, stochastic gradient descent, mini-batch gradient descent
Momentum, learning rate scheduling
D. Ensemble Methods
Bagging: bootstrap aggregating, random forest
Boosting: adaptive boosting, gradient boosting, XGBoost
Stacking: meta-learner, ensemble of models
E. Transfer Learning
Pretrained models, fine-tuning
Domain adaptation, multi-task learning
F. Explainable AI
Local interpretable model-agnostic explanations (LIME)
Shapley Additive exPlanations (SHAP)




Comments