我需要从下面的文本文件中提取值:
fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk
我需要提取的值是从开始到结束。
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Start":
copy = True
elif line.strip() == "End":
copy = False
elif copy:
outfile.write(line)
我正在使用的上述代码来自这个问题: Extract Values between two strings in a text file using python
此代码不会包含字符串“Start”和“End”。你会如何包括外围字符串?
答案 0 :(得分:2)
@en_Knight几乎是正确的。这是一个解决方案,以满足OP的请求,即分隔符包含在输出中:
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Start":
copy = True
if copy:
outfile.write(line)
# move this AFTER the "if copy"
if line.strip() == "End":
copy = False
或者只是在适用的情况下包含write():
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Start":
outfile.write(line) # add this
copy = True
elif line.strip() == "End":
outfile.write(line) # add this
copy = False
elif copy:
outfile.write(line)
更新:回答评论中的问题"仅使用'结束'在'开始'"之后,将最后elif line.strip() == "End"
更改为:
elif line.strip() == "End" and copy:
outfile.write(line) # add this
copy = False
如果只有一个"开始"但多个"结束"线......这听起来很奇怪,但这就是提问者所要求的。
答案 1 :(得分:1)
“elif
”means“仅在其他案例失败时执行此操作”。它在语法上等同于“else if
”,if you're coming from a differnet C-like语言。没有它,堕落应该包括“开始”和“结束”
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Start":
copy = True
if copy: # flipped to include end, as Dan H pointed out
outfile.write(line)
if line.strip() == "End":
copy = False
答案 2 :(得分:1)
RegExp方法:
import re
with open('input.txt') as f:
data = f.read()
match = re.search(r'\n(Start\n.*?\nEnd)\n', data, re.M | re.S)
if match:
with open('output.txt', 'w') as f:
f.write(match.group(1))