Question

我对 Python 非常陌生，正在尝试使用以下查询运行决策树模型：

from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import numpy as np
import pandas as pd
import sklearn as skl


data_forecast = pd.read_excel("./Forcast_data_Analytics.xlsx")

x = data_forecast[['Name','Power', 'FirstEventID','AlleventIds']]
y = data_forecast[['Possible_fix','Changes_Required']]

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.8)

classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

样本数据：

Name       Power      FirstEventID      AlleventIds         Possible_fix        Changes_Required
India      I3000       10130-1           10130-1, 134-00     yes                 Bug Fix

决策树分类可以不用标签编码吗？或者我是否需要对我的数据进行编码才能输入分类？

这样做的最佳方法是什么？我想将所有内容都视为字符串并对其进行编码。分类后，我也想解码。

我尝试了以下编码方法，但没有用：

from sklearn.preprocessing import LabelEncoder
vals = np.array(data_forecast)
LabelEncoder = LabelEncoder()
integer_encoded = LabelEncoder.fit_transform(vals)

错误：

Exception has occurred: ValueError
y should be a 1d array, got an array of shape (59, 23) instead.

这样做的正确方法是什么？我如何编码/解码我的标签并使用它？

Python标签编码：决策树分类

0 个答案: