Machine Learning February 18, 2025 5 min read

CatBoost's Secret Weapon: Ordered Target Encoding Explained

How CatBoost handles categorical features without data leakage using ordered target encoding — and why this gives it an edge on datasets with many categoricals.

The Data Leakage Problem

Naive target encoding leaks future information: when you encode a category with its mean target, you use information from the same row to encode itself.

CatBoost's Solution: Ordered Target Encoding

CatBoost processes rows in a random order. When encoding row i, it only uses statistics computed from rows 0..i-1. This prevents leakage by construction.

Implementation

from catboost import CatBoostClassifier

model = CatBoostClassifier(
    cat_features=['city', 'device_type', 'email_domain'],
    iterations=1000,
    learning_rate=0.05,
    depth=6,
)
model.fit(X_train, y_train)

No manual encoding needed — pass raw strings directly.

CatBoostCategorical FeaturesTarget EncodingGradient Boosting

Ossama Elhakki

AI Engineer & ML Systems Builder — Morocco

About me →Contact →