使用joblib加载大型训练有素的机器学习模型

时间:2020-07-19 20:43:36

标签: python scikit-learn pickle random-forest joblib

我正在尝试训练Radom Forest分类器。

这是我的代码:

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction import DictVectorizer
import pickle

tr_features = pickle.load(open('./dataset/train_features.dump', 'rb'))
tr_labels = pickle.load(open('./dataset/train_labels.dump', 'rb'))


vectorizer = DictVectorizer()  
vectorizer.fit(tr_features)                                    
vectorized_features = vectorizer.transform(tr_features)  


rf = RandomForestClassifier()
parameters = {
    'n_estimators': [5, 50, 250],
    'max_depth': [2, 4, 8, 16, 32, None]
}

cv = GridSearchCV(rf, parameters, cv=5)
cv.fit(vectorized_features, tr_labels)

训练完成后,我将joblib的模型转储如下:

import joblib
joblib.dump(cv.best_estimator_, './models/RF_model.pkl')

稍后,当我加载模型以在测试数据上运行

import joblib
rf = joblib.load('./models/RF_model.pkl')

运行上面的行时,Jupuyter笔记本的内核重新启动。当我检查RF_model.pkl文件大小时,它是422.7MB

我尝试了solution,并将compress=3参数传递给joblib.dump()方法。

joblib.dump(cv.best_estimator_, './models/RF_model.pkl',compress=3)

即使大小更改为43MB,但内核仍然重新启动,并且我无法加载模型。

注意:我正在从Docker容器中运行Jupyter笔记本。

0 个答案:

没有答案