Rattibha

Levi

5 Tweets 7 reads Mar 23, 2024

Take cross-validation to the next level!
Stratified cross-validation can help with class separation.
I will explain with an example ↓

Consider this:
You work with an ordered dataset that has 3 classes, equally distributed.
You do cross-validation and use 1/3 of the data as the first fold.
What will happen?
The first fold will be only Class 1.
Your accuracy will be 0, simple k-fold fails here.

The solution is stratified cross-validation.
We split the data such that the proportions between classes are the same in each fold.
Fold 1 will contain 1/3 of the data from each class as training sample.

Stratified cross-validation is more reliable than k-fold for classification.
In k-fold it's not rare to see cases where 1 fold contains data from only 1 class.
For regression, however, using k-fold would be the better strategy since it's harder to make representative folds.

That's it for today.
I have read about this technique in the book:
Introduction to Machine Learning with Python by Andreas C. Müller and Sarah Guido
Like/Retweet the first tweet below for support and follow @levikul09 for more Data Science threads.
Thanks 😉

Loading suggestions...

Categories

More from this author

Related Threads

Popular Threads

Categories

More from this author

Related Threads

Popular Threads

Unroll Thread