我有一个这样的列表,其中包含约141个条目:
training = [40.0,49.0,77.0,...... 3122.0]
,我的目标是选择列表的前20%。我是这样的:
testfile_first20 = training[0:int(len(set(training))*0.2)]
testfile_second20 = training[int(len(set(training))*0.2):int(len(set(training))*0.4)]
testfile_third20 = training[int(len(set(training))*0.4):int(len(set(training))*0.6)]
testfile_fourth20 = training[int(len(set(training))*0.6):int(len(set(training))*0.8)]
testfile_fifth20 = training[int(len(set(training))*0.8):]
有没有办法自动循环执行此操作?这是我选择Kfold的方式。
谢谢。
答案 0 :(得分:1)
您可以使用列表推导:
div_length = int(0.2*len(set(training)))
testfile_divisions = [training[i*div_length:(i+1)*div_length] for i in range(5)]
这将使您的结果堆积在列表中:
>>> [testfile_first20, testfile_second20, testfile_third20, testfile_fourth20, testfile_fifth20]
如果len(training)
没有平均分为五个部分,那么您可以将五个完整的部分除以六,其余部分如下:
import math
div_length = math.floor(0.2*len(set(training)))
testfile_divisions = [training[i*div_length:min(len(training), (i+1)*div_length)] for i in range(6)]
或者您可以进行四次完全除法,第五次除以余数,如下:
import math
div_length = math.ceil(0.2*len(set(training)))
testfile_divisions = [training[i*div_length:min(len(training), (i+1)*div_length)] for i in range(5)]
答案 1 :(得分:1)
这是列表理解的简单方法
lst = list('abcdefghijkl')
l = len(lst)
[lst[i:i+l//5] for i in range(0, l, l//5)]
# [['a', 'b'],
# ['c', 'd'],
# ['e', 'f'],
# ['g', 'h'],
# ['i', 'j'],
# ['k', 'l']]
编辑:实际上,现在我看一下我的答案,它不是真正的20%表示形式,因为它返回6个子列表而不是5个子列表。当列表不能平均分为5个部分时,会发生什么?我将暂时搁置它,直到进一步说明。
答案 2 :(得分:0)
您可以通过仅将20%的“大小”和当前起点存储在两个变量中来进行循环。然后将一个添加到另一个:
start = 0
twenty_pct = len(training) // 5
parts = []
for k in range(5):
parts.append(training[start:start+twenty_pct])
start += twenty_pct
但是,我怀疑有numpy / pandas / scipy操作可能更适合您的需求。例如,sklearn包含一个名为KFold的函数:https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
答案 3 :(得分:0)
类似这样的事情,但也许由于四舍五入而可能会丢失一个元素。
tlen = float(len(training))
testfiles = [ training[ int(i*0.2*tlen): int((i+1)*0.2*tlen) ] for i in range(5) ]