学习曲线图保持测试数据不变

时间:2016-06-23 13:06:33

标签: python for-loop machine-learning

首先我定义了X和y,部分描述如下。

    from sklearn import svm
    from sklearn.cross_validation import train_test_split

    X = array([[11.8, 0., 3.4, 5.7, 0., 5.7],
    [33.4, 6.8, 0., 5.7, 0., 5.7],
    [33.4, 6.8, 0., 5.7, 0., 5.7])

    y = array([ 1.,  1.,  0.])

我正在使用下面代码中创建的字典绘制学习曲线:

#First separation of test data
X_train_prev, X_test_prev, y_train_prev, y_test_prev = train_test_split(X, y, test_size = 0.2)

#storing test and training error in dictionary as a function of decreasing test size
array = np.arange(0.01,0.9,0.025)
dicto = {}


for i in array: 
    X_train, _, y_train, _ = train_test_split(X_train_prev, y_train_prev, test_size = i)
    clf.fit(X_train,y_train)    

    #use the previous test data...
    test = clf.score(X_test_prev, y_test_prev) 
    train = clf.score(X_train, y_train)
    dicto[i] = test, train

print(dicto)

我的学习曲线如下: learning curve

问题是测试错误与模型无关。这怎么可能?我应该如何更改我的代码,使测试错误依赖于训练模型?

1 个答案:

答案 0 :(得分:1)

from sklearn.svm import SVC
from sklearn.datasets import load_iris
data = load_iris()
X    = data.data
y    = data.target
clf  = SVC()
#====
#Your code
#====
test_training_error = dicto.values()
test_training_error_sorted = sorted(test_training_error, key = lambda e:e[0])   #I think this is important.

from matplotlib import pyplot as plt
plt.plot(test_training_error_sorted[0], test_training_error_sorted[1])

我使用了sklearn的数据,结果是可以的。这个数字很正常。也许您应该检查您的代码数据和排序数据以绘制图形。