我正在尝试将file_1.txt列表中每个项目的第100个首次出现提取到新文件中。项目列表(在下面的代码中称为 target )包含file_1.txt的第一列
file_1.txt
now:::ADV 1.48 be:::V 1.85 5488284
then:::ADV 1.44 be:::V 1.85 3994804
now:::ADV 1.48 have:::V 2.18 1760901
then:::ADV 1.44 have:::V 2.18 1099284
enough:::ADV 1.33 be:::V 1.85 928947
suppose:::V 1.37 be:::V 1.85 874407
ever:::ADV 1.48 be:::V 1.85 859428
我试过的代码在这里:
with open('file_1.txt', 'r') as infile, open('file_2.txt', 'w') as outfile:
target = []
i = 1
for line in infile:
columns = line.split("\t")
column_1 = columns[0]
if column_1 not in target:
target.append(column_1)
for item in target:
if line.startswith(item) and i <=100:
outfile.write(line)
i += 1
当然,这只是将file_1.txt的100行打印到file_2.txt。是否有一种pythonic方式一次只读取1行,将其附加到 target 并搜索100次首次出现,将其打印到file_2.txt并继续第1列中的下一个唯一单词file_1.txt?
我真的很感激任何帮助或建议。
答案 0 :(得分:1)
如果我正确理解了您的要求,则无法在没有缓冲的情况下实现这些要求。以下方法使用字典。它基于您的代码:
with open('file_1.txt', 'r') as infile, open('file_2.txt', 'w') as outfile:
target = {}
for line in infile:
columns = line.split("\t")
column_1 = columns[0]
try:
target[column_1].append(line)
if len(target[column_1]) == 100:
for tline in target[column_1]:
outfile.write(tline)
target[column_1] = None # mark word as finished
except KeyError: # we haven't seen that word before -> start new list
target[column_1] = [line]
except AttributeError: # this is raised each time we try appending to None
pass