将数据集划分为训练和测试后,将标签转换为指标矩阵

时间:2018-07-10 19:12:31

标签: python pandas machine-learning neural-network

将Y转换为指标矩阵效果很好:

file = 'dataset.csv'    
X, Y = readFile(file)
N = len(Y)
T = np.zeros((N, K)) 
for i in range(N):
    T[i, Y[i]] = 1 

但是当我分成训练和测试后做同样的事情时,像这样:我会出错

X, Y = shuffle(X, Y)
Ntrain = int(0.7*len(X))
Xtrain, Ytrain = X[:Ntrain], Y[:Ntrain]
Xtest, Ytest = X[Ntrain:], Y[Ntrain:]
N1 = len(Ytrain)
T1 = np.zeros((N1, K))  
for i in range(N1):
    T1[i, Ytrain[i]] = 1 

它在最后一行显示错误:T1 [i,Ytrain [i]] = 1,我在哪里出错? K是班数= 9

print(np.unique(Y))
print(np.unique(Ytrain))

上面的打印语句给出:

[0 1 2 3 4 5 6 7 8]
[0 1 2 3 4 5 6 7 8]

enter image description here

2 个答案:

答案 0 :(得分:1)

T1的大小为N1 * K,并且您正在尝试将值设置为索引Ytrain [i]。如果Ytrain [i]> = K,那么您将得到一个KeyError

更新:

for i in range(N1):
    print(i)
    T1[i, Ytrain[i]] = 1

答案 1 :(得分:0)

此问题得到解决: 不知何故,Ytrain没有索引。因此,我将Ytrain转换为熊猫系列:

data = np.array(Ytrain)
Ytrain1 = pd.Series(data)
N1 = len(Ytrain1)
T1 = np.zeros((N1, K))
for i in range(N1): 
   print(i, Ytrain1[i]) # Prints fine 
   T1[i, Ytrain1[i]] = 1