Random Forest Algorithm Clearly Explained!

Aug 15, 2023

Random Forest Algorithm Clearly Explained!

Here, I’ve explained the Random Forest Algorithm with visualizations. You’ll also learn why the random forest is more robust than decision trees.
#machinelearning #datascience

For more videos please subscribe -
http://bit.ly/normalizedNERD

Join our discord -
https://discord.gg/39YYU936RC

Facebook -
https://www.facebook.com/nerdywits/
Instagram -
https://www.instagram.com/normalizedn…
Twitter -
https://twitter.com/normalized_nerd

Content

0.21 -> Hello, people from the future welcome to Normalized Nerd!

3.55 -> Today we’ll set up our camp in the Random Forest.

6.46 -> First, we’ll see why the random forest is better than our good old decision trees, and

13.62 -> then I’ll explain how it works with visualizations.

17.21 -> If you wanna see more videos like this, please subscribe to my channel and hit the bell icon

22.019 -> because I make videos about machine learning and data science regularly.

26.149 -> So without further ado let’s get started.

28.749 -> To begin our journey, we need a dataset.

31.63 -> Here I’m taking a small dataset with only 6 instances and 5 features.

37.739 -> As you can see the target variable y takes 2 values 0 and 1 hence it’s a binary classification

45.41 -> problem.

46.41 -> First of all, we need to understand why do we even need the random forest when we already

51.809 -> have decision trees.

53.53 -> Let’s draw the decision tree for this dataset.

65.28 -> Now if you don’t know what a decision tree really is or how it is trained then I’d

70.49 -> highly recommend you to watch my previous video.

72.649 -> In short, a decision tree splits the dataset recursively using the decision nodes unless

79.219 -> we are left with pure leaf nodes.

82.31 -> And it finds the best split by maximizing the entropy gain.

87.009 -> If a data sample satisfies the condition at a decision node then it moves to the left

92.71 -> child else it moves to the right and finally reaches a leaf node where a class label is

100.329 -> assigned to it.

101.43 -> So, what’s the problem with decision trees?

104.13 -> Let’s change our training data slightly.

107.259 -> Focus on the row with id 1.

109.95 -> We are changing the x0 and x1 features.

113.52 -> Now if we train our tree on this modified dataset we’ll get a completely different

119.72 -> tree.

123.88 -> This shows us that decision trees are highly sensitive to the training data which could

129.48 -> result in high variance.

131.59 -> So our model might fail to generalize.

134.83 -> Here comes the random forest algorithm.

137.28 -> It is a collection of multiple random decision trees and it’s much less sensitive to the

142.78 -> training data.

143.78 -> You can guess that we use multiple trees hence the name forest.

149.01 -> But why it’s called random?

150.59 -> Keep this question in the back of your mind you’ll get the answer by the end of this

154.66 -> video.

155.66 -> Let me show you the process of creating a random forest.

158.72 -> The first step is to build new datasets from our original data.

163.95 -> To maintain simplicity we’ll build only 4.

167.79 -> We are gonna randomly select rows from the original data to build our new datasets.

172.95 -> And every dataset will contain the same number of rows as the original one.

178.51 -> Here’s the first dataset.

179.96 -> Due to lack of space, I’m writing only the row ids.

184.57 -> Notice that, row 2 and 5 came more than once that’s because we are performing random

191.22 -> sampling with replacement.

193.22 -> That means after selecting a row we are putting it back into the data.

198.15 -> And here are the rest of the datasets.

205.55 -> The process we just followed to create new data is called Bootstrapping.

211.27 -> Now we’ll train a decision tree on each of the bootstrapped datasets independently.

217.95 -> But here’s a twist we won’t use every feature for training the trees.

223.06 -> We’ll randomly select a subset of features for each tree and use only them for training.

229.08 -> For example, in the first case, we’ll only use the features x0, x1.

235.34 -> Similarly, here are the subsets used for the remaining trees.

240.84 -> Now that we have got the data and the feature subsets let’s build the trees.

276.37 -> Just see how different the trees look from each other.

280.05 -> And this my friend is the random forest containing 4 trees.

284.61 -> But how to make a prediction using this forest?

287.159 -> Let’s take a new data point.

289.52 -> We’ll pass this data point through each tree one by one and note down the predictions.

321.729 -> Now we have to combine all the predictions.

324.889 -> As it’s a classification problem we’ll take the majority voting.

328.52 -> Clearly, 1 is the winner hence the prediction from our random forest is 1.

334.849 -> This process of combining results from multiple models is called aggregation.

340.22 -> So in the random forest, we first perform bootstrapping then aggregation and in the

345.909 -> jargon, it’s called bagging.

348.62 -> Okay so that was how we build a random forest now I should discuss some of the very important

354.909 -> points related to this.

358.34 -> Why it’s called random forest?

360.289 -> Because we have used two random processes, bootstrapping and random feature selection.

367.469 -> But what is the reason behind bootstrapping and feature selection?

371.65 -> Well, bootstrapping ensures that we are not using the same data for every tree so in a

378.529 -> way it helps our model to be less sensitive to the original training data.

383.969 -> The random feature selection helps to reduce the correlation between the trees.

388.71 -> If you use every feature then most of your trees will have the same decision nodes and

394.949 -> will act very similarly.

396.61 -> That’ll increase the variance.

398.599 -> There’s another benefit of random feature selection.

402.189 -> Some of the trees will be trained on less important features so they will give bad predictions

409.52 -> but there will also be some trees that give bad predictions in the opposite direction

415.86 -> so they will balance out.

417.719 -> Next point, what is the ideal size of the feature subset?

421.93 -> Well, in our case we took 2 features which is close to the square root of the total number

427.809 -> of features i.e. 5.

431.09 -> Researchers found that values close to the log and sqrt of the total number of features

436.999 -> work well.

438.919 -> How to use this for regression?

442.659 -> While combining the predictions just take the average and you are all set to use it

447.86 -> for regression problems.

449.629 -> So that was all about it.

450.81 -> I hope now you have a pretty good understanding of the random forest.

454.53 -> If you enjoyed this video, please share this and subscribe to my channel.

458.689 -> Stay safe and thanks for watching!

Source: https://www.youtube.com/watch?v=v6VJ2RO66Ag