Skip to content

01 Performance Optimization

Performance Optimization

n_jobs=-1

Multi-threading

## !pip install scikit-learn-intelex
from sklearnex import patch_sklearn
patch_sklearn()

Config

with sklearn.config_context(
  assume_finite = True
):
  # reduce validation overhead: will not throw a ValueError if X contains NaN or infinity.

  pass # do learning/prediction here with reduced validation
with sklearn.config_context(
    working_memory = 128 # MB
):

  pass # do chunked work here

Model Compression

Linear models

model = SGDRegressor(penalty='elasticnet', l1_ratio=0.25)
model.fit(X_train, y_train)
model.sparsify()

Warm Start

Useful for re-using previous training as initial values

Useful for hyper-parameter tuning

max_estimators = 100

rf = RandomForestClassifier(
  warm_start=True
)

n_estimators = 1
while n_estimators <= max_estimators:
  rf.n_estimators = n_estimators
    rf.fit(X_train, y_train)

  n_estimators *= 2

The advantage here is that the estimators would already be fit with the previous parameter setting, and with each subsequent call to fit, the model will be starting from the previous parameters, and we're just analyzing if adding new estimators would benefit the model.

Mini-Batch/Online Learning

model = LinearRegression()

model.partial_fit(data_1)
model.partial_fit(data_2)

Config

with sklearn.config_context(
    assume_finite = True,
    skip_parameter_validation = True
):
    pass
Last Updated: 2024-12-26 ; Contributors: AhmedThahir, web-flow

Comments