首先我定义了X和y,部分描述如下。
from sklearn import svm
from sklearn.cross_validation import train_test_split
X = array([[11.8, 0., 3.4, 5.7, 0., 5.7],
[33.4, 6.8, 0., 5.7, 0., 5.7],
[33.4, 6.8, 0., 5.7, 0., 5.7])
y = array([ 1., 1., 0.])
我正在使用下面代码中创建的字典绘制学习曲线:
#First separation of test data
X_train_prev, X_test_prev, y_train_prev, y_test_prev = train_test_split(X, y, test_size = 0.2)
#storing test and training error in dictionary as a function of decreasing test size
array = np.arange(0.01,0.9,0.025)
dicto = {}
for i in array:
X_train, _, y_train, _ = train_test_split(X_train_prev, y_train_prev, test_size = i)
clf.fit(X_train,y_train)
#use the previous test data...
test = clf.score(X_test_prev, y_test_prev)
train = clf.score(X_train, y_train)
dicto[i] = test, train
print(dicto)
问题是测试错误与模型无关。这怎么可能?我应该如何更改我的代码,使测试错误依赖于训练模型?
答案 0 :(得分:1)
from sklearn.svm import SVC
from sklearn.datasets import load_iris
data = load_iris()
X = data.data
y = data.target
clf = SVC()
#====
#Your code
#====
test_training_error = dicto.values()
test_training_error_sorted = sorted(test_training_error, key = lambda e:e[0]) #I think this is important.
from matplotlib import pyplot as plt
plt.plot(test_training_error_sorted[0], test_training_error_sorted[1])
我使用了sklearn的数据,结果是可以的。这个数字很正常。也许您应该检查您的代码数据和排序数据以绘制图形。