我正在尝试运行下面的代码。
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report
from sklearn import cross_validation
from sklearn.tree import DecisionTreeClassifier
from utilities import visualize_classifier
# Load input data
input_file = 'data_decision_trees.txt'
data = np.loadtxt(input_file, delimiter=',')
X, y = data[:, :-1], data[:, -1]
# Separate input data into two classes based on labels
class_0 = np.array(X[y==0])
class_1 = np.array(X[y==1])
# Visualize input data
plt.figure()
plt.scatter(class_0[:, 0], class_0[:, 1], s=75, facecolors='black',
edgecolors='black', linewidth=1, marker='x')
plt.scatter(class_1[:, 0], class_1[:, 1], s=75, facecolors='white',
edgecolors='black', linewidth=1, marker='o')
plt.title('Input data')
# Split data into training and testing datasets
X_train, X_test, y_train, y_test = cross_validation.train_test_split(
X, y, test_size=0.25, random_state=5)
# Decision Trees classifier
params = {'random_state': 0, 'max_depth': 4}
classifier = DecisionTreeClassifier(**params)
classifier.fit(X_train, y_train)
visualize_classifier(classifier, X_train, y_train, 'Training dataset')
y_test_pred = classifier.predict(X_test)
visualize_classifier(classifier, X_test, y_test, 'Test dataset')
# Evaluate classifier performance
class_names = ['Class-0', 'Class-1']
print("\n" + "#"*40)
print("\nClassifier performance on training dataset\n")
print(classification_report(y_train, classifier.predict(X_train), target_names=class_names))
print("#"*40 + "\n")
print("#"*40)
print("\nClassifier performance on test dataset\n")
print(classification_report(y_test, y_test_pred, target_names=class_names))
print("#"*40 + "\n")
plt.show()
此行发生错误:
from sklearn import cross_validation
我在Anaconda命令窗口中尝试了以下操作:
conda update scikit-learn
pip install -U scikit-learn
在这两种情况下,我都得到确认,指出“要求已经是最新的”。我不确定为什么在尝试运行脚本时会得到这些错误消息,因为所有sklearn的所有内容似乎都是最新的。还有什么我可以尝试的吗?该代码肯定可以正常工作,但是会不断跳闸。谢谢。
答案 0 :(得分:2)
如果您的sklearn是最新的,则sklearn.model_selection不推荐使用sklearn.cross_validate模块。试试:
from sklearn.model_selection import train_test_split