使用Lenskit进行交叉验证的意外训练量

时间:2020-03-14 23:42:13

标签: python machine-learning cross-validation lenskit

我一直在使用Lenskit's user_partition function来生成交叉验证的k倍。此函数接受3个参数,数据,分区(要生成的分区数)和SampleFrac(CV的分数)。在此功能中,我一直在改变partitons的数量,后来我分析了测试和训练拆分的时间:

for i in range(1,6):
    training = []
    testing= []
    for train, test in xf.partition_users(ratings[['user', 'item', 'rating']], i, xf.SampleFrac(0.2)):
        training.append(train)
        testing.append(test)
    testing = pd.concat(testing, ignore_index=True)
    training = pd.concat(training, ignore_index=True)
    print("Shape of testing:",testing.shape)
    print("Shape of training:",training.shape)

输出:

Shape of testing: (20000, 3)
Shape of training: (80000, 3)
Shape of testing: (20000, 3)
Shape of training: (180000, 3)
Shape of testing: (20000, 3)
Shape of training: (280000, 3)
Shape of testing: (20000, 3)
Shape of training: (380000, 3)
Shape of testing: (20000, 3)
Shape of training: (480000, 3)

我试图理解为什么选定数量的分区和SampleFrac导致此输出。我期望以下输出:

Shape of testing: (20000, 3)
Shape of training: (80000, 3)
Shape of testing: (40000, 3)
Shape of training: (160000, 3)
Shape of testing: (60000, 3)
Shape of training: (240000, 3)
Shape of testing: (80000, 3)
Shape of training: (320000, 3)
Shape of testing: (100000, 3)
Shape of training: (400000, 3)

有人可以向我解释我哪里错了吗?

0 个答案:

没有答案