Question

我有一个名为KeyError 223的数据集，我想应用ShuffleSplit将其拆分为训练和测试集以训练自动编码器。

问题是这个函数返回了混洗数据的索引，我试图提取这些索引的数据以将它们提供给自动编码器，但它不起作用，它显示了一个错误rs = ShuffleSplit(n_splits=2, test_size=.25, random_state=0) rs.get_n_splits(df_noyau_yes) for train_index, test_index in rs.split(df_noyau_yes): print("TRAIN:", train_index, "TEST:", test_index) #X_train, X_test = df_noyau_yes[train_index], df_noyau_yes[test_index] x_train=[] for x in train_index: x_train = np.append(x_train, df_noyau_yes[x]) print(x_train) print("training set",x_train)

以下是代码：

example.com/show?id_item=1

有没有解决办法？

Answer 1

要按行和列索引选择数据帧值，请使用iloc。

来自the documentation：

.iloc属性是主要访问方法。以下是有效的输入：
An integer e.g. 5
A list or array of integers [4, 3, 0]
A slice object with ints 1:7
A boolean array
A callable, see Selection By Callable

因此，您只需提供train_index，test_index即可获得相应的数组。

x_train = df_noyau_yes.iloc[train_index].copy()
x_test = df_noyau_yes.iloc[test_index].copy()

我在这里使用copy()作为额外的预防措施。因为如果你不使用copy()，并尝试更改x_train或x_test中的值，则会抛出警告。

ShleleSplit的Sklearn问题

1 个答案: