将Y转换为指标矩阵效果很好:
file = 'dataset.csv'
X, Y = readFile(file)
N = len(Y)
T = np.zeros((N, K))
for i in range(N):
T[i, Y[i]] = 1
但是当我分成训练和测试后做同样的事情时,像这样:我会出错
X, Y = shuffle(X, Y)
Ntrain = int(0.7*len(X))
Xtrain, Ytrain = X[:Ntrain], Y[:Ntrain]
Xtest, Ytest = X[Ntrain:], Y[Ntrain:]
N1 = len(Ytrain)
T1 = np.zeros((N1, K))
for i in range(N1):
T1[i, Ytrain[i]] = 1
它在最后一行显示错误:T1 [i,Ytrain [i]] = 1,我在哪里出错? K是班数= 9
print(np.unique(Y))
print(np.unique(Ytrain))
上面的打印语句给出:
[0 1 2 3 4 5 6 7 8]
[0 1 2 3 4 5 6 7 8]
答案 0 :(得分:1)
T1的大小为N1 * K,并且您正在尝试将值设置为索引Ytrain [i]。如果Ytrain [i]> = K,那么您将得到一个KeyError
更新:
for i in range(N1):
print(i)
T1[i, Ytrain[i]] = 1
答案 1 :(得分:0)
此问题得到解决: 不知何故,Ytrain没有索引。因此,我将Ytrain转换为熊猫系列:
data = np.array(Ytrain)
Ytrain1 = pd.Series(data)
N1 = len(Ytrain1)
T1 = np.zeros((N1, K))
for i in range(N1):
print(i, Ytrain1[i]) # Prints fine
T1[i, Ytrain1[i]] = 1