5 Tweets 7 reads Mar 23, 2024
Take cross-validation to the next level!
Stratified cross-validation can help with class separation.
I will explain with an example ↓
Consider this:
You work with an ordered dataset that has 3 classes, equally distributed.
You do cross-validation and use 1/3 of the data as the first fold.
What will happen?
The first fold will be only Class 1.
Your accuracy will be 0, simple k-fold fails here.
The solution is stratified cross-validation.
We split the data such that the proportions between classes are the same in each fold.
Fold 1 will contain 1/3 of the data from each class as training sample.
Stratified cross-validation is more reliable than k-fold for classification.
In k-fold it's not rare to see cases where 1 fold contains data from only 1 class.
For regression, however, using k-fold would be the better strategy since it's harder to make representative folds.
That's it for today.
I have read about this technique in the book:
Introduction to Machine Learning with Python by Andreas C. MΓΌller and Sarah Guido
Like/Retweet the first tweet below for support and follow @levikul09 for more Data Science threads.
Thanks πŸ˜‰

Loading suggestions...