ScikitLearn中的多标签网格搜索

时间:2014-06-16 03:31:46

标签: python machine-learning scikit-learn

我是scikit-learn的新手,我想用scikit-learn GridSearch找到多标签分类问题的最佳参数。我不能让它工作,我很确定标签有问题。

我的代码如下所示:

X, Y = load_svmlight_file( TRAIN_FILE, dtype=np.float64, multilabel=True )

clf_pipeline = OneVsRestClassifier(
            Pipeline([('pca', RandomizedPCA()),
                      ('clf', SVC())
                      ]))
#grid search parameters
c_range = 10.0 ** np.arange(-2, 9)
gamma_range = 10.0 ** np.arange(-5, 4)
n_components_range = (10, 100, 200)
degree_range = (1, 2, 3, 4)

#grid search
param_grid = dict(estimator__clf__gamma=gamma_range,
              estimator__clf__c=c_range,
              estimator__clf__degree=degree_range,
              estimator__pca__n_components=n_components_range)

grid = GridSearchCV(clf_pipeline, param_grid, verbose=2)
grid.fit(X, Y)

出现在“grid.fit(X,Y)”行中。 回溯:

File "C:\Python27\lib\site-packages\sklearn\grid_search.py", line 597, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
File "C:\Python27\lib\site-packages\sklearn\grid_search.py", line 359, in _fit
    cv = check_cv(cv, X, y, classifier=is_classifier(estimator))
File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1361, in _check_cv
    cv = StratifiedKFold(y, cv, indices=needs_indices)
File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 429, in __init__
    label_test_folds = test_folds[y == label]
IndexError: too many indices for array

我使用scikit-learn 0.15。

EDIT1。该代码在Linux中运行良好,但在Windows 7 64位

上运行不正常

0 个答案:

没有答案