Question

我正在

C：/Users/HP/.PyCharmCE2019.1/config/scratches/scratch.py追溯   （最近通话最近）：
  文件“ C：/Users/HP/.PyCharmCE2019.1/config/scratches/scratch.py”，第25行，   在dtree.fit（x_train，y_train）中
  文件“ C：\ Users \ HP \ PycharmProjects \ untitled \ venv \ lib \ site-packages \ sklearn \ tree \ tree.py”，   第801行，适合X_idx_sorted = X_idx_sorted）
  文件“ C：\ Users \ HP \ PycharmProjects \ untitled \ venv \ lib \ site-packages \ sklearn \ tree \ tree.py”，   第236行，适合“样本数=％d”％（len（y），n_samples））
  ValueError：标签数= 45与样本数= 36不匹配

我正在使用DecisionTree模型，但出现错误。帮助将不胜感激。

#importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#reading the dataset
df=pd.read_csv(r'C:\csv\kyphosis.csv')
print(df)
print(df.head())

#visualising the dataset
print(sns.pairplot(df,hue='Kyphosis',palette='Set1'))
plt.show()

#training and testing
from sklearn.modelselection import traintestsplit 
c=df.drop('Kyphosis',axis=1) d=df['Kyphosis'] 
xtrain,ytrain,xtest,ytest=traintestsplit(c,d,testsize=0.30)

#Decision_Tree
from sklearn.tree import DecisionTreeClassifier
dtree=DecisionTreeClassifier()
dtree.fit(xtrain,ytrain)

#Predictions
predictions=dtree.predict(xtest) from sklearn.metrics import 
classificationreport,confusionmatrix 
print(classificationreport(ytest,predictions)) 
print(confusionmatrix(y_test,predictions))

预期结果应该是我的classification_report和confusion_matrix

Answer 1

因此，函数dtree.fit(xtrain, ytrain)会引发错误，因为xtrain和ytrain的长度不相等。

检查生成它的代码部分：

xtrain,ytrain,xtest,ytest=traintestsplit(c,d,testsize=0.30)

，并与documentation

中的示例进行比较

import numpy as np
from sklearn.model_selection import train_test_split
[...]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

您会看到两件事：

1 traintestsplit应该是train_test_split

2通过更改=左侧变量的顺序，可以为这些变量分配不同的数据。

因此，您的代码应为：

 xtrain, xtest, ytrain, ytest = train_test_split(c,d,testsize=0.30)

ValueError：标签数量= 25与样本数量不匹配= 56

1 个答案: