给出一个像这样的输入文件data.dat
:
# Some comment
# more comments
#
45.78
# aaa
0.056
0.67
# aaa
345.
0.78
99.
2.34
# aaa
65.7
0.9
我需要在以“ # aaa
”开头的每一行上方添加不同的注释,以便使其看起来像这样:
# Some comment
# more comments
#
45.78
# cmmt1
# aaa
0.056
0.67
# another cmmt
# aaa
345.
0.78
99.
2.34
# last one
# aaa
65.7
0.9
我先验知道# aaa
文件中出现的“ data.dat
”注释的数量,但不知道它们的位置。
我有一种方法可以做到这一点(请参见下面的代码),但是它很复杂,而且效率不高。我需要将此代码应用于数百个大文件,因此我正在寻找一种有效的方法来实现此目的。
# Read file
with open("data.dat", mode="r") as f:
data = f.readlines()
# Indexes of "# aaa" comments
idx = []
for i, line in enumerate(data):
if line.startswith("# aaa"):
idx.append(i)
# Insert new comments in their proper positions
add_data = ["# cmmt1\n", "# another cmmt\n", "# last one\n"]
for i, j in enumerate(idx):
data.insert(j + i, add_data[i])
# Write final data to file
with open("data_final.dat", mode="w") as f:
for item in data:
f.write("{}".format(item))
答案 0 :(得分:2)
我没有进行任何基准测试,但是re.sub
可能会更快-只需加载整个文本文件,执行re.sub
并将其写出来即可:
data = '''# Some comment
# more comments
#
45.78
# aaa
0.056
0.67
# aaa
345.
0.78
99.
2.34
# aaa
65.7
0.9'''
import re
def fn():
add_data = ["# cmmt1\n", "# another cmmt\n", "# last one\n"]
for d in add_data:
yield d
out = re.sub(r'^# aaa', lambda r, f=fn(): next(f) + r.group(0), data, flags=re.MULTILINE)
print(out)
打印:
# Some comment
# more comments
#
45.78
# cmmt1
# aaa
0.056
0.67
# another cmmt
# aaa
345.
0.78
99.
2.34
# last one
# aaa
65.7
0.9
使用文件输入/输出:
import re
def fn():
add_data = ["# cmmt1\n", "# another cmmt\n", "# last one\n"]
for d in add_data:
yield d
with open('data.dat', 'r') as f_in, \
open('data.out', 'w') as f_out:
f_out.write(re.sub(r'^# aaa', lambda r, f=fn(): next(f) + r.group(0), f_in.read(), flags=re.MULTILINE))
版本2:
import re
def fn():
add_data = ["# cmmt1\n", "# another cmmt\n", "# last one\n"]
add_data = [s + '#aaa' for s in add_data]
for d in add_data:
yield d
with open('data.dat', 'r') as f_in, \
open('data.out', 'w') as f_out:
f_out.write(re.sub(r'^# aaa', lambda r, f=fn(): next(f), f_in.read(), flags=re.MULTILINE))
答案 1 :(得分:1)
根据Jan-Philip Gehrcke's response here,您应减少write
个呼叫。
为此,您可以简单地进行更改:
with open("data_final.dat", mode="w") as f:
for item in data:
f.write("{}".format(item))
至:
with open("data_final.dat", mode="w") as f:
f.write("".join(data))
答案 2 :(得分:1)
当我需要更改文本文件中的数据时,我尝试使用一个句柄进行读取,然后立即使用第二个句柄进行写入。
def add_comments(input_file_name, output_file_name, list_of_comments):
comments = iter(list_of_comments) # or itertools.cycle(list_of_comments)
with open(input_file_name) as fin, open(output_file_name, 'w') as fout:
for line in fin:
if line.startswith("# aaa"):
fout.write(next(comments))
fout.write(line)
对于您的示例代码,如果将其称为:
add_comments("data.dat", "final_data.dat", ["# cmmt1\n", "# another cmmt\n", "# last one\n"])