调用cross_val_score时出现ValueError

时间:2019-12-06 09:19:47

标签: python machine-learning

我正在尝试为机器学习设计一个项目,我想对多个算法进行准确性评估。我正在使用this CSV,并且仅加载日期,时间和CO列(我在CSV中手动将其重命名)。准备好训练数据后,我尝试进行评估,但是得到:

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'unknown' instead.

用于评估的向量(X_train和Y_train)的形状为:

(9357, 2)
(9357,)

课程:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier


class Models:
    test_size: float
    random_state: int

    def __init__(self, test_size: float = 0.20, random_state: int = 1) -> None:
        super().__init__()
        self.test_size = test_size
        self.random_state = random_state

    @staticmethod
    def init_models() -> []:
        return [
            ('LR', LogisticRegression(solver='liblinear', multi_class='ovr')),
            ('LDA', LinearDiscriminantAnalysis()),
            ('KNN', KNeighborsClassifier()),
            ('CART', DecisionTreeClassifier()),
            ('NB', GaussianNB()),
            ('SVM', SVC(gamma='auto'))
        ]

    def train(self, x: [], y: []):
        x_train, x_validation, y_train, y_validation = train_test_split(x, y, test_size=self.test_size,
                                                                        random_state=self.random_state)
        return x_train, x_validation, y_train, y_validation

    def evaluate(self, x_train: [], y_train: [], splits: int = 10, random_state: int = 1):
        results = []
        names = []
        models = self.init_models()
        for name, model in models:
            kfold = StratifiedKFold(n_splits=splits, random_state=random_state)
            cv_results = cross_val_score(estimator=model, X=x_train, y=y_train, cv=kfold, scoring='accuracy')
            results.append(cv_results)
            names.append(name)
            print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))

我把我的班级称为:

models_helper = Models()
array = dataset.values
X = array[:, 1:3]
Y = array[:, 2]

prepared = models_helper.train(X, Y)

classification = models_helper.evaluate(prepared[0], prepared[2])

1 个答案:

答案 0 :(得分:0)

通过先使用cross_val_predict计算预测值,然后使用y_test预测值来获取metrics.accuracy_score的得分,从而避免了这个问题。

# Function that runs the requested algorithm and returns the accuracy metrics.
# Passing the sklearn model as an argument along with cv values and training data.
def fit_ml_algo(algo, X_train, y_train, cv):

# One Pass
model = algo.fit(X_train, y_train)
acc = round(model.score(X_train, y_train) * 100, 2)

# Cross Validation 
train_pred = model_selection.cross_val_predict(algo, 
                                              X_train, 
                                              y_train, 
                                              cv=cv, 
                                              n_jobs = -1)
# Cross-validation accuracy metric
acc_cv = round(metrics.accuracy_score(y_train, train_pred) * 100, 2)

return train_pred, acc, acc_cv