Question

我有数据框，我需要在使用最近邻法之前对质量进行评估。我使用sklearn.cross_validation.KFold，但我不知道，我怎么能为这个函数提供数据帧。

quality = KFold(df, n_folds=5, shuffle=True, random_state=42)

但它返回

TypeError: int() argument must be a string, a bytes-like object or a number, not 'DataFrame'

我该如何解决？

Answer 1

您应该传递要执行拆分的行数：

quality = KFold(len(df), n_folds=5, shuffle=True, random_state=42)

这将使用df的行数并返回一个索引数组来执行拆分，然后可以使用它来切片df：

for train_index, test_index in quality:
    # do something with slices
    df.iloc[train_index]
    df.iloc[test_index]

如果您的df索引是int64索引并且是单调的并且从0增加，那么您可以使用loc代替iloc

Answer 2

documentation是否可以将数据帧作为第一个参数传递？没有。它接受号码。仔细阅读文档。