我对 Python 非常陌生,正在尝试使用以下查询运行决策树模型:
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import numpy as np
import pandas as pd
import sklearn as skl
data_forecast = pd.read_excel("./Forcast_data_Analytics.xlsx")
x = data_forecast[['Name','Power', 'FirstEventID','AlleventIds']]
y = data_forecast[['Possible_fix','Changes_Required']]
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.8)
classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
样本数据:
Name Power FirstEventID AlleventIds Possible_fix Changes_Required
India I3000 10130-1 10130-1, 134-00 yes Bug Fix
决策树分类可以不用标签编码吗? 或者我是否需要对我的数据进行编码才能输入分类?
这样做的最佳方法是什么? 我想将所有内容都视为字符串并对其进行编码。 分类后,我也想解码。
我尝试了以下编码方法,但没有用:
from sklearn.preprocessing import LabelEncoder
vals = np.array(data_forecast)
LabelEncoder = LabelEncoder()
integer_encoded = LabelEncoder.fit_transform(vals)
错误:
Exception has occurred: ValueError
y should be a 1d array, got an array of shape (59, 23) instead.
这样做的正确方法是什么? 我如何编码/解码我的标签并使用它?