当我们使用_KFold.split(X)
时,其中X是一个DataFrame,生成的索引是将数据分为训练集和测试集的,它是iloc
(基于纯整数位置的索引,用于按位置选择)还是loc
(按标签划分的行和列组的位置)?
答案 0 :(得分:1)
您需要DataFrame.iloc
才能按位置选择行:
示例:
np.random.seed(100)
df = pd.DataFrame(np.random.random((10,5)), columns=list('ABCDE'))
#changed default index values
df.index = df.index * 10
print (df)
A B C D E
0 0.543405 0.278369 0.424518 0.844776 0.004719
10 0.121569 0.670749 0.825853 0.136707 0.575093
20 0.891322 0.209202 0.185328 0.108377 0.219697
30 0.978624 0.811683 0.171941 0.816225 0.274074
40 0.431704 0.940030 0.817649 0.336112 0.175410
50 0.372832 0.005689 0.252426 0.795663 0.015255
60 0.598843 0.603805 0.105148 0.381943 0.036476
70 0.890412 0.980921 0.059942 0.890546 0.576901
80 0.742480 0.630184 0.581842 0.020439 0.210027
90 0.544685 0.769115 0.250695 0.285896 0.852395
from sklearn.model_selection import KFold
#added some parameters
kf = KFold(n_splits = 5, shuffle = True, random_state = 2)
result = next(kf.split(df), None)
print (result)
(array([0, 2, 3, 5, 6, 7, 8, 9]), array([1, 4]))
train = df.iloc[result[0]]
test = df.iloc[result[1]]
print (train)
A B C D E
0 0.543405 0.278369 0.424518 0.844776 0.004719
20 0.891322 0.209202 0.185328 0.108377 0.219697
30 0.978624 0.811683 0.171941 0.816225 0.274074
50 0.372832 0.005689 0.252426 0.795663 0.015255
60 0.598843 0.603805 0.105148 0.381943 0.036476
70 0.890412 0.980921 0.059942 0.890546 0.576901
80 0.742480 0.630184 0.581842 0.020439 0.210027
90 0.544685 0.769115 0.250695 0.285896 0.852395
print (test)
A B C D E
10 0.121569 0.670749 0.825853 0.136707 0.575093
40 0.431704 0.940030 0.817649 0.336112 0.175410