Machine Learning: definitions

Math

This is a simple list of terms and definitions useful when studying machine learning. I found it useful to create the list for me, but may be useful to others studying the topic.

TermDefinition
Definition of machine learning (Tom Mitchell)A computer program is said to learn from experience `E` with respect to some task $T$ and some performance measure $P$, if its performance on $T$, as measured by $P$, improve with experience $E$.
Training setCollection of examples the system uses to learn.
Training instance or sampleEach example in a training set.
Test setCollection of examples used to test the system. When the total amount of data is provided for the machine, it is typically divided into the two sets: training set and test set. A reasonable split is 80% training set and 20% test set.
AttributeData type associated to each example in the training set (e.g. mileage of a car).
FeatureData type associated to each example in the training set with its value (e.g. mileage = 15000).
Data miningUsing machine learning techniques to dig into large amount of data to find pattern not previously clear.
Supervised machine learningMachine learning system that is trained by providing training data with the desired solutions (labels).
Unsupervised machine learningMachine learning system that is trained through unlabeled data.
Semisupervised machine learningMachine learning system that is trained using partially labeled data.
Reinforcement learningMachine learning system capable of learning by receiving rewards of penalties on performed actions. Its aim is to find the best strategy (policy) to maximize the reward.
Batch learning or offline learningTraining technique where the training set is entirely provided ahead of time. The system cannot learn incrementally.
Online learningTraining technique where learning is carried on incrementally. This technique is useful for contexts in which data arrives as a flow (e.g. stock prices).
Learning rateMeasure of how quickly the system adapts to changes. This measure only makes sense in online learning systems.
Instance-based learningGeneralization technique where the system learns examples by heart and generalizes to new cases by measuring the distance between them.
Model-based learningGeneralization is based on a model. The model makes the predictions (e.g. linear regression).
Sampling noiseUndesirable effect of a training in which training set is too small, and nonrepresentative data sneak in as a result of chance.
Sampling biasUndesirable effect of training in which the training set may be large, but nonrepresentative of the entire context.
Feature engineeringThe process of selecting enough relevant and representative features to train the system.
Feature selectionThe process of selecting relevant features to train the machine.
Feature extractionThe feature of combining, producing or reducing features.
OverfittingThe situation in which the model performs well on the training data, but does not generalize well. Reducing the risk of overfitting is named regularization.
UnderfittingThe situation in which the model is too simple to learn and then predict.
HyperparameterParameter applied to the learning algorithm, not part of the model itself. It is applied before the training process to regularize data.
Degress of freedomParameters of the model.
Generalization error or out-of-sample errorError rate on new cases on which the system is tested.
Holdout validationTechnique used to find a proper regularization hyperparameter. Part of the training set is held out to evaluate several candidate models. The new set is called validation set (or development set or dev set).
Multiple models are trained on the reduced training set and the one best performing on the validation set is selected. The best model is then trained on the full training set.
Cross-validationTechnique used to find proper regularization parameter. Holdout validation may not be practical if the validation set is too small or if the reduced training set is much smaller than the full training set. Cross-validation uses multiple validation sets. Each model is evaluated once per validation set, after being trained on the rest of the data. Evaluations of the model on each validation sets are then averaged.
PipelineSequence of processing components.
$m$Number of examples present in a test set.
$\boldsymbol{x}^{\left(i\right)}$Column vector containing all the features, excluding the label, for the $i$th example in the test set.
$\boldsymbol{X}$matrix containing all the values of $\boldsymbol{x}^{\left(i\right)}$:
$$\boldsymbol{X}=\left[\begin{array}{c}
\left(\boldsymbol{x}^{\left(1\right)}\right)^{T}\\
\left(\boldsymbol{x}^{\left(2\right)}\right)^{T}\\
\vdots\\
\left(\boldsymbol{x}^{\left(m\right)}\right)^{T}
\end{array}\right]$$
$h$Prediction function or hypothesis. Given am example in the training set $\boldsymbol{x}^{\left(i\right)}$, the prediction is:
$$\hat{y}^{\left(i\right)}=h\left(\boldsymbol{x}^{\left(i\right)}\right)$$
$\hat{y}^{\left(i\right)}-y^{\left(i\right)}$ is the prediction error.
Root mean square errorMeasure of the error in a measurement. It can be used to measure the performance.
$$RMSE(X,h)=\sqrt{\frac{1}{m}\cdot\sum_{i=1}^{m}\left(h\left(\boldsymbol{x}^{\left(i\right)}\right)-y^{\left(i\right)}\right)^{2}}$$
Mean absolute errorMeasure of the error in a measurement:
$$MAE\left(\boldsymbol{X},h\right)=\frac{1}{m}\cdot\sum_{i=1}^{m}\left|h\left(\boldsymbol{x}^{\left(i\right)}\right)-y^{\left(i\right)}\right|$$
Confusion matrixA confusion matrix is a matrix used to measure the performance of a classification model. In a confusion matrix $C=\left[c_{i,j}\right]_{i=0}^{m-1}$, where $m$ is the number of classes, $c_{i,j}$ represents how many times an instance of class $i$ is predicted of class $j$.
Accuracy (of a classifier)The number of samples correctly classified out of all the samples present in the test set. Referring to the confusion matrix $C$:
$$A=\frac{\sum_{i=0}^{m-1}c_{i,i}}{\sum_{i=0}^{m-1}\sum_{j=0}^{m-1}c_{i,j}}$$
Precision (of a classifier)The number of samples actually belonging to the $k$ class out of all the samples that were predicted to be of the $k$ class by the model. By referring to the confusion matrix $C$:
$$P=\frac{c_{k,k}}{\sum_{i=0}^{m-1}c_{i,k}}$$
Recall (of a classifier)The number of samples predicted correctly to be belonging to the $k$ class out of all the samples that actually belong to the $k$ class. By referring to the confusion matrix $C$:
$$R=\frac{c_{k,k}}{\sum_{j=0}^{m-1}c_{k,j}}$$
$F_1$ score$F_1$ is the harmonic mean of recall and precision:
$$F_{1}=\frac{2}{\frac{1}{R}+\frac{1}{P}}$$
Definitions used in machine learning.

Leave a Reply

Your email address will not be published. Required fields are marked *