Question

我一直在使用Lenskit's user_partition function来生成交叉验证的k倍。此函数接受3个参数，数据，分区（要生成的分区数）和SampleFrac（CV的分数）。在此功能中，我一直在改变partitons的数量，后来我分析了测试和训练拆分的时间：

for i in range(1,6):
    training = []
    testing= []
    for train, test in xf.partition_users(ratings[['user', 'item', 'rating']], i, xf.SampleFrac(0.2)):
        training.append(train)
        testing.append(test)
    testing = pd.concat(testing, ignore_index=True)
    training = pd.concat(training, ignore_index=True)
    print("Shape of testing:",testing.shape)
    print("Shape of training:",training.shape)

输出：

Shape of testing: (20000, 3)
Shape of training: (80000, 3)
Shape of testing: (20000, 3)
Shape of training: (180000, 3)
Shape of testing: (20000, 3)
Shape of training: (280000, 3)
Shape of testing: (20000, 3)
Shape of training: (380000, 3)
Shape of testing: (20000, 3)
Shape of training: (480000, 3)

我试图理解为什么选定数量的分区和SampleFrac导致此输出。我期望以下输出：

Shape of testing: (20000, 3)
Shape of training: (80000, 3)
Shape of testing: (40000, 3)
Shape of training: (160000, 3)
Shape of testing: (60000, 3)
Shape of training: (240000, 3)
Shape of testing: (80000, 3)
Shape of training: (320000, 3)
Shape of testing: (100000, 3)
Shape of training: (400000, 3)

有人可以向我解释我哪里错了吗？

使用Lenskit进行交叉验证的意外训练量

0 个答案: