当尝试使用网格搜索运行机器学习管道时,出现以下错误。我不确定此错误来自何处,因为网格搜索似乎正确命名并带有正确的参数。
"ValueError: Invalid parameter min_samples_split for estimator MultiOutputClassifier(estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False),
n_jobs=None). Check the list of available parameters with `estimator.get_params().keys()`.
"
model = Pipeline([
('features', FeatureUnion([
('text_pipeline', Pipeline([
('vect', CountVectorizer(tokenizer=tokenize)),
('tfidf', TfidfTransformer())
])),
('starting_verb', StartingVerbExtractor())
])),
('clf', MultiOutputClassifier(RandomForestClassifier()))
])
parameters = {
'features__text_pipeline__vect__ngram_range': ((1, 1), (1, 2)),
'features__text_pipeline__vect__max_df': (0.5, 0.75, 1.0),
'features__text_pipeline__vect__max_features': (None, 5000, 10000),
'clf__n_estimators': [50, 100, 200],
'clf__min_samples_split': [2, 3, 4]
}
cv = GridSearchCV(model, param_grid=parameters, verbose=2, n_jobs=4)
答案 0 :(得分:0)
找到了错误的根源。我将参数更改为以下内容:
parameters = {
'features__text_pipeline__vect__ngram_range': ((1, 1), (1, 2)),
'features__text_pipeline__vect__max_df': (0.5, 0.75, 1.0),
'features__text_pipeline__vect__max_features': (None, 5000, 10000),
'clf__estimator__n_estimators': [50, 100, 200],
'clf__estimator__min_samples_split': [2, 3, 4]
}