Question

我想在以下数据帧上使用来自sklearn的TimeSeriesSplit来预测总和：

所以要准备X和y，我需要执行以下操作：

X = df.drop(['sum'],axis=1)
y = df['sum']

，然后将这两个信息提供给：

for train_index, test_index in tscv.split(X):
X_train01, X_test01 = X[train_index], X[test_index]
y_train01, y_test01 = y[train_index], y[test_index]

这样做，我得到以下错误：

KeyError: '[ 0  1  2 ...] not in index'

这里X是一个数据帧，显然这会导致错误，因为如果我将X转换为数组，如下所示：

X = X.values

然后它将起作用。但是，为以后评估模型，我需要X作为数据框。有什么方法可以将X保留为数据帧并将其提供给tscv而不将其转换为数组？

Answer 1

正如@Jarad正确说的那样，如果您已更新熊猫的版本，它将不会像以前的版本那样自动切换到基于整数的索引。您需要为基于整数的切片显式使用.iloc。

for train_index, test_index in tscv.split(X):
    X_train01, X_test01 = X.iloc[train_index], X.iloc[test_index]
    y_train01, y_test01 = y.iloc[train_index], y.iloc[test_index]

请参见https://pandas.pydata.org/pandas-docs/stable/indexing.html

sklearn TimeSeriesSplit错误：KeyError：'[0 1 2 ...]不在索引中'

1 个答案: