这是对我之前的问题(text file reduction with randomization in Python)的跟进。正在修改以运行多次减少,但只有第一个输出文件包含减少,以下3个文件的大小为零。这一定是显而易见的,我没有看到......
#!/usr/bin/env python
import random
import sys
from itertools import chain, groupby
def choose_random(iterator, fraction, random=random.random):
"""Lazy analog of:
L = list(iterator)
k = int(len(L) * fraction + .5) or 1 # get at least one
result = random.sample(L, k)
Note: this function doesn't randomize the order of elements
that would require to keep selected elements in memory
and number of output elements is not exactly k
"""
# always yield at least one item if input is not empty
item = next(iterator)
it = (x for x in chain([item], iterator) if random() < fraction)
for x in chain([next(it, item)], it):
yield x
def getkey(line):
return line.split("\t")[3] # 4th column
reductions = [0.25, 0.50, 0.75, 1]
filename = "foo"
outfile = [open("-".join([x, filename]), "w") for x in map(str, reductions)]
try:
with open(filename, "r") as f:
for ln, k in enumerate(map(float, reductions)):
for key, group in groupby(f, key=getkey):
outfile[ln].writelines(choose_random(group, fraction=k))
finally:
for f in outfile:
f.close()
输出显示如下(文件0.25-foo包含正确的缩小,其余为空):
-rw-r--r-- 1 staff staff 53326048 Mar 27 03:42 0.25-foo
-rw-r--r-- 1 staff staff 0 Mar 27 03:42 0.5-foo
-rw-r--r-- 1 staff staff 0 Mar 27 03:42 0.75-foo
-rw-r--r-- 1 staff staff 0 Mar 27 03:42 1-foo
答案 0 :(得分:3)
您打开foo
一次,但尝试重复四次。在第一次缩减结束时,您将在文件末尾。要么重新打开它:
try:
for ln, k in enumerate(map(float, reductions)):
with open(filename, "r") as f:
for key, group in groupby(f, key=getkey):
outfile[ln].writelines(choose_random(group, fraction=k))
finally:
for f in outfile:
f.close()
每次减少后或倒带:
try:
with open(filename, "r") as f:
for ln, k in enumerate(map(float, reductions)):
for key, group in groupby(f, key=getkey):
outfile[ln].writelines(choose_random(group, fraction=k))
f.seek(0)
finally:
for f in outfile:
f.close()
我会在某个时刻打开这两个文件:
reductions = [0.25, 0.50, 0.75, 1.0]
filename = "foo"
for fraction in reductions:
with open(filename, "r") as f, open('%s-%s' % (fraction, filename), 'w') as outfile:
for key, group in groupby(f, key=getkey):
outfile.writelines(choose_random(group, fraction=fraction))