01 Performance Optimization
Performance Optimization¶
n_jobs=-1
¶
Multi-threading
Config¶
with sklearn.config_context(
assume_finite = True
):
# reduce validation overhead: will not throw a ValueError if X contains NaN or infinity.
pass # do learning/prediction here with reduced validation
Model Compression¶
Linear models
model = SGDRegressor(penalty='elasticnet', l1_ratio=0.25)
model.fit(X_train, y_train)
model.sparsify()
Warm Start¶
Useful for re-using previous training as initial values
Useful for hyper-parameter tuning
max_estimators = 100
rf = RandomForestClassifier(
warm_start=True
)
n_estimators = 1
while n_estimators <= max_estimators:
rf.n_estimators = n_estimators
rf.fit(X_train, y_train)
n_estimators *= 2
The advantage here is that the estimators would already be fit with the previous parameter setting, and with each subsequent call to fit, the model will be starting from the previous parameters, and we're just analyzing if adding new estimators would benefit the model.