ValueError:多类因变量不支持unknown

时间:2017-07-31 03:41:06

标签: python scikit-learn

我正在尝试在sklearn中插入一个向量,但是我收到了这个错误:

  

ValueError:不支持unknown   这是我的代码:

    X = df_features.values
    X = X.reshape((len(X),len(df_features.columns)))
    Y = df_train['action'].values
    Y = Y.reshape((len(Y),))

pipeline = Pipeline([
 ('clf', RandomForestClassifier())
])

parameters = {
    'clf__max_depth': [5,7,9],
    'clf__max_features': [3,4,5],
    'clf__min_samples_leaf': [3,4,5,6,7],
    'clf__bootstrap': [True]
}

score_func = make_scorer(metrics.f1_score,average='weighted')

grid_search = GridSearchCV(pipeline, parameters, n_jobs=3,
  verbose=1, scoring=score_func)

grid_search.fit(X, Y)

这是Y样本数据:

  <'>''没有','没什么','卖','卖','没什么',          '没什么','没什么']

我该如何解决这个问题? 感谢

1 个答案:

答案 0 :(得分:0)

请检查x和y的类型和大小。此外,您是否有足够的样本用于所需的max_depth和min_samples_leaf?

以下示例似乎工作正常。我使用虹膜数据并留下一个交叉验证作为例子。

from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import fbeta_score, make_scorer
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris 
import numpy as np
from sklearn import metrics
from sklearn.model_selection import LeaveOneOut


loo= LeaveOneOut()
data = load_iris()

x = data.data
x = x[0:14,:]
x.shape

y = ['NOTHING', 'NOTHING', 'SELL', 'SELL', 'NOTHING', 'NOTHING','SELL','SELL','NOTHING','SELL','SELL','NOTHING','NOTHING','NOTHING']
y = np.asarray(y)
y = y.reshape(14,1)
y = y.astype('str')


pipeline = Pipeline( [ ('clf', RandomForestClassifier() )] )

parameters = {'clf__max_depth': [1,2,3], 'clf__max_features': [1,2,3], 'clf__min_samples_leaf': [1,2,3], 'clf__bootstrap': [True] }

score_func = make_scorer(metrics.f1_score,average='weighted')

grid_search = GridSearchCV(pipeline, parameters, n_jobs=1 , verbose=1, scoring=score_func, cv = loo)

grid_search.fit(x, y)

结果

Fitting 14 folds for each of 45 candidates, totalling 630 fits
[Parallel(n_jobs=1)]: Done 630 out of 630 | elapsed:   33.7s finished

希望这有帮助