Question

现在通过一个讨价还价的教程，虽然通过查看输出和阅读文档，我得到了它的基本概念，我想我需要确认这里发生了什么：

predictors = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
kf = KFold(titanic.shape[0], n_folds=3, random_state=1)

predictions = []

for train, test in kf:
     train_predictors = (titanic[predictors].iloc[train,:])

我的主要问题是iloc函数的最后一行。其余的只是上下文。它只是将训练数据分开？

Answer 1

.iloc[]是访问row column pandas（或DataFrames的{{1}}和Series索引的主要方法，在这种情况下index scikit-learn仅限1}}。 in the Indexing docs解释得很清楚。

在这种特定情况下，来自KFold docs：

import numpy as np from sklearn.cross_validation import KFold kf = KFold(4, n_folds=2) for train, test in kf: print("%s %s" % (train, test)) [2 3] [0 1] [0 1] [2 3]除以k组样本中的所有样本，称为折叠（如果k = n，这相当于Leave One Out策略），相等尺寸（如果可能）。使用k-1学习预测函数折叠，剩下的折叠用于测试。 2倍的例子对包含4个样本的数据集进行交叉验证：
KFold

换句话说，index会选择for个位置，这些位置会在kf循环.iloc上使用并传递给row index以便选择来自包含训练集的columns titanic[predictors]的相应DataFrame（以及所有27}。

使用iloc索引

1 个答案: