Bagging vs Boosting
gocourse.in Maintenance

We'll be back soon

Our CDN (cdn.gocourse.in) is currently unreachable. Some images, JavaScript, or CSS files may not load properly.

Estimated downtime: ~30 minutes

Bagging vs Boosting

Harine

Bagging vs Boosting

In daily life, we often make decisions by considering different possibilities, similar to how a Decision Tree works. In organizations, decision trees are widely used in supervised machine learning to analyze data and support better decision-making, which can improve efficiency and profits.

Sometimes a single decision tree may not give the best result. To improve accuracy, ensemble are used. Ensemble learning combines multiple weak models (usually decision trees) to create a stronger and more accurate model. The main idea is that many weak learners working together can produce a strong learner. Two popular ensemble techniques are Bagging and Boosting.

Bagging (Bootstrap Aggregating)

Bagging is used mainly to reduce the variance of a model and improve prediction accuracy.
In bagging, the original training dataset is divided into multiple random subsets using sampling
with replacement. Each subset is used to train a separate decision tree. After all trees are
trained, their predictions are combined (usually by averaging or voting) to produce the final
result.
This approach works better than using a single decision tree because it reduces the effect of
overfitting.

Random Forest

Random Forest is an advanced version of bagging.
In addition to creating random subsets of data, Random Forest also selects a random subset of
features while building each tree. This creates many different decision trees, and the combined
predictions of these trees form the final result.

Steps in Random Forest

  • Assume the training dataset contains X observations and Y features.
  • Randomly select samples from the dataset with replacement.
  • Build a decision tree using the selected data and a random subset of features.
  • Repeat the process multiple times to create many trees.
  • The final prediction is obtained by combining the predictions of all trees.

Advantages of Random Forest

  • Works well with large and high-dimensional datasets.
  • Can handle missing values effectively.
  • Usually provides high prediction accuracy.

 Disadvantages of Random Forest

  • For regression problems, the final prediction is the average of multiple trees, so it may not always give highly precise values.

Boosting

Boosting is another ensemble technique used to improve model performance.

In boosting, decision trees are built sequentially instead of independently. Each new tree focuses on correcting the errors made by the previous tree.

If a data point is misclassified by a model, its importance (weight) is increased. This allows thenext model to focus more on correctly predicting that data point. By combining many suchmodels, boosting converts weak learners into a strong predictive model. 

Gradient Boosting

Gradient Boosting is a powerful extension of the boosting method.

It combines the ideas of Boosting and Gradient Descent optimization.

Gradient Boosting = Gradient Descent + Boosting

In this method:

  • Trees are built one after another.
  • Each new tree tries to reduce the error (loss) made by the previous model.
  • The loss is calculated as the difference between the actual value and the predicted
  • value.

Advantages of Gradient Boosting

  • Supports different types of loss functions.
  • Works well for capturing complex relationships and interactions in data.

Disadvantages of Gradient Boosting

  • Requires careful tuning of hyperparameters to achieve good performance.
  • Training can be slower compared to simpler models.

Our website uses cookies to enhance your experience. Learn More
Accept !