我正在尝试使用scikit-learn gridsearchcv
解决问题,并且在sklearn的GridSearchCV方法中设置PicklingError
时,n_jobs=-1
会出错。我在搜索论坛寻找解决方案,但找不到有用的东西。似乎错误在多处理模块中,并且无法在并行处理环境中序列化对象。我这里没有使用任何自定义类,只是标准的sklearn变换器和估算器。任何帮助解决这个问题将非常感激。感谢。
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from nltk.tokenize import WordPunctTokenizer
X = np.asarray(['This is a sample text',
'Here is another one',
'How about this?',
'Some random text again',
'Need to make this work',
'sklearn is awesome!',
'Adding more train data',
'and more and more data'
])
y = np.asarray([1, 2, 2, 1, 1, 1, 2, 1])
pipeline = Pipeline([
('vectorizer', TfidfVectorizer(ngram_range=(1, 3), tokenizer=WordPunctTokenizer().tokenize, stop_words='english')),
('estimator', LogisticRegression(class_weight='balanced'))
])
param_grid = dict()
param_grid['vectorizer__sublinear_tf'] = [True, False]
param_grid['vectorizer__smooth_idf'] = [True, False]
param_grid['vectorizer__norm'] = ['l1', 'l2']
param_grid['estimator__penalty'] = ['l1', 'l2']
grid_clf = GridSearchCV(pipeline, param_grid, verbose=2, n_jobs=-1, scoring='f1_micro')
grid_clf.fit(X, y)
print "\nBest parameters:", grid_clf.best_params_
print "Best score:", grid_clf.best_score_, "\n"
错误:
Fitting 3 folds for each of 16 candidates, totalling 48 fits
---------------------------------------------------------------------------
PicklingError Traceback (most recent call last)
<ipython-input-17-04b1aa29a4ee> in <module>()
30
31 grid_clf = GridSearchCV(pipeline, param_grid, verbose=2, n_jobs=-1, scoring='f1_micro')
---> 32 grid_clf.fit(X, y)
33
34 print "\nBest parameters:", grid_clf.best_params_
....
PicklingError: Can't pickle <type 'instancemethod'>: it's not found as __builtin__.instancemethod