我想用Python从txt文件中提取字母的特定部分。开头和结尾用清晰的开头/结尾表达式(letter_begin / letter_end)标记。我的问题是,文本的“记录”需要从letter_begin列表中任何项的第一个出现开始,到letter_end列表中的最后一个项(+3行缓冲区)结束。我想将输出文本写入文件。到目前为止,这是我的示例文本和代码:
sample_text = """Some random text right here
.........
Dear Shareholders: We are pleased to provide this report to our shareholders and fellow shareholders. we thank you for your continued support.
Best regards,
Douglas - Director
Other random text in this lines """
letter_begin = ["dear", "to our shareholders", "fellow shareholders"]
letter_end = ["best regards", "respectfully submitted", "thank you for your continued support"]
with open(filename, 'r', encoding="utf-8") as infile, open(xyz.txt, mode = 'w', encoding="utf-8") as f:
text = infile.read()
lines = text.strip().split("\n")
target_start_idx = None
target_end_idx = None
for index, line in enumerate(lines):
line = line.lower()
if any(beg in line for beg in letter_begin):
target_start_idx = index
continue
if any(end in line for end in letter_end):
target_end_idx = index + 3
break
if target_start_idx is not None:
target = "\n".join(lines[target_start_idx : target_end_idx])
f.write(str(target))
我想要的输出应该是:
output = "Dear Shareholders: We are pleased to provide this report to our shareholders and fellow shareholders. we thank you for your continued support.
Best regards,
Douglas - Director
"
答案 0 :(得分:0)
您的循环使您可以最后次出现打开序列。
您应该将读取的部分分成两个循环,如下所示:
with open(filename, 'r', encoding="utf-8") as infile:
text = infile.read()
lines = text.strip().split("\n")
target_start_idx = None
target_end_idx = None
for index, line in enumerate(lines):
line = line.lower()
if any(beg in line for beg in letter_begin):
target_start_idx = index
break
for index, line in enumerate(lines):
if any(end in line for end in letter_end):
target_end_idx = index + 3
continue
通过这种方式,当第一次出现打开序列时,您退出循环。