Question

此代码生成错误：

IndexError: invalid index to scalar variable.

行：results.append(RMSPE(np.expm1(y_train[testcv]), [y[1] for y in y_test]))

如何解决？

import pandas as pd
import numpy as np
from sklearn import ensemble
from sklearn import cross_validation

def ToWeight(y):
    w = np.zeros(y.shape, dtype=float)
    ind = y != 0
    w[ind] = 1./(y[ind]**2)
    return w

def RMSPE(y, yhat):
    w = ToWeight(y)
    rmspe = np.sqrt(np.mean( w * (y - yhat)**2 ))
    return rmspe

forest = ensemble.RandomForestRegressor(n_estimators=10, min_samples_split=2, n_jobs=-1)

print ("Cross validations")
cv = cross_validation.KFold(len(train), n_folds=5)

results = []
for traincv, testcv in cv:
    y_test = np.expm1(forest.fit(X_train[traincv], y_train[traincv]).predict(X_train[testcv]))
    results.append(RMSPE(np.expm1(y_train[testcv]), [y[1] for y in y_test]))

testcv是：

[False False False ...,  True  True  True]

Answer 1

您正在尝试索引标量（不可迭代）值：

[y[1] for y in y_test]
#  ^ this is the problem

当您致电[y for y in test]时，您正在迭代这些值，因此您在y中获得了一个值。

您的代码与尝试执行以下操作相同：

y_test = [1, 2, 3]
y = y_test[0] # y = 1
print(y[0]) # this line will fail

我不确定您要尝试进入结果数组，但是您需要摆脱[y[1] for y in y_test]。

如果你想将y_test中的每个y附加到结果中，你需要进一步扩展你的列表理解：

[results.append(..., y) for y in y_test]

或者只使用for循环：

for y in y_test:
    results.append(..., y)

Answer 2

基本上，1不是y的有效索引。如果访问者使用自己的代码访问，则应检查其y是否包含他尝试访问的索引（在这种情况下，索引为1）。

Answer 3

在for中，您有一个迭代，然后对于该循环的每个可能是标量的元素，都没有索引。如果每个元素都是空数组，单个变量或标量，而不是列表或数组，则不能使用索引。

如何修复IndexError：标量变量的索引无效

3 个答案: