ValueError:未知标签类型:array([0.11],...)在制作额外树模型时

时间:2016-02-08 09:09:23

标签: python scikit-learn

我试图在此数据集上使用额外的树分类器,并且出于某种原因在

model.fit(trainx,trainy)

部分,它抛出了一个

ValueError: Unknown label type: array([[ 0.11],
       [ 0.12],
       [ 0.64],
       [ 0.83],
       [ 0.33],
       [ 0.72],
       [ 0.49],

错误。数组([0.11]是我的训练数据。我已经搜索了堆栈溢出,显然它是由于sklearn没有识别数据类型,但是我已经尝试了所有内容来自

trainy = np.asarray(trainy,dtype=float)
trainy=trainy.astype(float)

并且它不起作用,即使type(trainy)显示它的numpy.ndarray。谁能指出我在这方面的正确方向?

以下是代码:

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn import metrics
from sklearn.ensemble import ExtraTreesClassifier
from sklearn import cross_validation


def preProcess():
    df= pd.read_csv('C:/Users/X/Desktop/Managerial_and_Decision_Economics_2013_Video_Games_Dataset.csv',encoding ='ISO-8859-1')
    #drop non EA
    df = df[df['EA'] ==1]
    #change categorical variables
    le = LabelEncoder()
    nonnumeric_columns=['Console','Title','Publisher','Genre']
    for feature in nonnumeric_columns:
        df[feature] = le.fit_transform(df[feature])
    #set dataset and target variables
    dataset =df.ix[:, df.columns != 'US Sales (millions)']
    target = df['US Sales (millions)']

    trainx, testx, trainy, testy = cross_validation.train_test_split(
        dataset, target, test_size=0.3, random_state=0)
    #attempt to fix error?
    trainx=np.array(trainx)
    trainy = np.asarray(trainy, dtype="float")
    return trainx,testx,trainy,testy

def classifier():
    model =  ExtraTreesClassifier(n_estimators=250,
                              random_state=0)
    model.fit(trainx,trainy)
    return model.score(testx,testy)


trainx,testx,trainy,testy=preProcess()

我在python 3.5上使用scikit-learn 0.17

2 个答案:

答案 0 :(得分:6)

您的标签[[0.11], [ 0.12],....。 您应该使用ExtraTreesRegressor代替ExtraTreesClassifier

来自ForestClassifier的源代码:

 y : array-like, shape = [n_samples] or [n_samples, n_outputs]
            The target values (class labels in classification, real numbers in
            regression).

答案 1 :(得分:1)

我的数组浮在其中,当创建one_hot时,我得到了同样的错误。

training_labels = np.append(training_labels, [label])
...
y_one_hot = label_binarizer.fit_transform(training_labels)
ValueError: Unknown label type: (array([ 0. ,  0.1,

由于我正在进行分类,我不得不将它们转换为字符串

training_labels = np.append(training_labels, [str(label)])
['0.0' '0.1' '-0.2' ..., '0.0' '0.0' '0.1']