- Basic understanding of traditional machine learning models.
- Feature engineering of data.
Feature engineering of data is the process of extracting the contrasting features of the data. These features define the entirety of certain instance of the data. Feature engineering demands domain knowledge of the data that is being dealt with, and consequently it is applicable in traditional machine learning models.
A contrasting feature of the data contribute minimal to the definition of a particular instance. Hence classifier based on one feature will result weak learner because only one feature can't generalise the overall definition of the data. Data is defined by combination of features, that makes it a unique instance of particular domain. Weak learners fail to classify such a obvious fact.
Consider a classifying task such as predicting cat or a dog from a picture. The defining aspects of these two animals are wideness of mouth, sharpness of claws, size of limbs, shapes of eyes, size of the animal and ears etc. These disparate aspects helps us to identify whether a the animal in an image is dog or a cat. If we were to classify an image based on a single rule, the prediction would be flawed. Each of the rules, individually, are called weak learners because these rules are not strong enough to make predictions.
Ensemble learning is a method that is used to enhance the performance of Machine Learning model by combining several learners. There are two types of ensemble learning, parallel ensemble learning aka Bagging and Sequential ensemble learning aka Boosting.
Bootstrap refers to random sampling with replacement. Bootstrap allows us to better understand the bias and the variance with the dataset. Bootstrap involves random sampling of small subset of data from the dataset. This subset can be replace. The selection of all the example in the dataset has equal probability. This method can help to better understand the mean and standand deviation from the dataset.
Weak learners are prodcued parallely by bootstrapping the data. Example is Random Forest algorithm.
Boosting is class of algorithms that ensemble a learning technique that uses a set of machine learning algorithms in order to convert or combine weak learns to strong learners in order to increase the accuracy of the model. One of such algorithms is combining the predictions of each features by taking the majority rule or weighted average. This makes the weak learner to strong learner.
References : https://youtu.be/kho6oANGu_A