是否可以将RFE
和GridSearch
与ClassifierChain
一起使用?
我当前方法的一小段内容:
sss = StratifiedShuffleSplit(n_splits=7, test_size=600, random_state=1)
clf = ClassifierChain(LinearSVC(), order='random', cv=sss, random_state=3)
rfe = RFE(estimator=clf, step=100, n_features_to_select=0.93)
features_array = rfe.fit_transform(features_array, labels_data.values.ravel())
parameters = {
"clf__base_estimator": (LinearSVC(C=1e-20), ),
"clf__order": ([0,1,2,3,4,5,6],),
"clf__cv": (sss, ),
"clf__random_state": (None, 0, 1, 2)
}
pipeline = Pipeline([('clf', clf)])
grid_search = GridSearchCV(pipeline, parameters, scoring = scoring, refit = 'rec', n_jobs=-1, cv=sss, error_score='raise', return_train_score=True, verbose=True)
grid_search.fit(features_array, labels_data.values.ravel())
这不起作用,执行后,我收到大量异常。我使用Anaconda Jupyter Notebook,因此运行时环境也有例外。 也许你们解决了这个问题,乍看之下是什么问题,否则我将发布例外。
编辑:不使用RFE的新方法:
scoring = {'rec': 'recall', 'acc': 'accuracy', 'prec': 'precision', 'rocauc': 'roc_auc'}
tfidf = TfidfVectorizer(#stuff)
features_array = tfidf.fit_transform(clean_features_ref)
clf = ClassifierChain(LinearSVC(), order='random', cv=sss, random_state=3)
parameters = {
'base_estimator__C': (1e-14,1e-10, 1e-4)
}
grid_search = GridSearchCV(clf, parameters, scoring = scoring, refit = 'rec', n_jobs=-1, cv=sss,
error_score='raise', return_train_score=True, verbose =True)
grid_search.fit(features_array, labels_data.values.ravel())
Edit_2:所以我刚刚发现它已经无法工作:
clf = ClassifierChain(LinearSVC(), order='random', cv=sss, random_state=3)
clf.fit(features_array, labels_data.values.ravel())
使用
IndexError:元组索引超出范围