test.txt将是
1
2
3
start
4
5
6
end
7
8
9
我希望结果是
start
4
5
6
end
这是我的代码
file = open('test.txt','r')
line = file.readline()
start_keyword = 'start'
end_keyword = 'end'
lines = []
while line:
line = file.readlines()
for words_in_line in line:
if start_keyword in words_in_line:
lines.append(words_in_line)
file.close()
print entities
返回
['start\n']
我不知道在上面的代码中添加什么来实现我想要的结果。我一直在搜索和更改代码,但我不知道如何让它按照我想要的方式工作。
答案 0 :(得分:1)
当遇到start_keyword
时,您可以使用某种设置为true的标志,如果设置了该标志,则将行添加到lines
列表,并在{{end_keyword
时取消设置遇到1}}(但只有在将end_keyword写入lines
列表后才会这样。
同时使用.strip()
上的words_in_line
删除\n
(以及其他尾随和前导空格)如果您不希望它们列在lines
列表中,如果您这样做想要它们,然后不要剥掉它。
示例 -
flag = False
for words_in_line in line:
if start_keyword in words_in_line:
flag = True
if flag:
lines.append(words_in_line.strip())
if end_keyword in words_in_line:
flag = False
请注意,这会在start
列表中添加多个end
到lines
块,我猜这就是您想要的。
答案 1 :(得分:1)
使用旗帜。试试这个:
file = open('test.txt','r')
start_keyword = 'start'
end_keyword = 'end'
in_range = False
entities = []
lines = file.readlines()
for line in lines:
line = line.strip()
if line == start_keyword:
in_range = True
elif line == end_keyword:
in_range = False
elif in_range:
entities.append(line)
file.close()
# If you want to include the start/end tags
#entities = [start_keyword] + entities + [end_keyword]
print entities
关于您的代码,请注意readlines已经读取了文件中的所有行,因此调用readline似乎没有多大意义,除非您忽略第一行。还可以使用strip从字符串中删除EOL字符。请注意您的代码没有按照您的预期执行:
# Reads ALL lines in the file as an array
line = file.readlines()
# You are not iterating words in a line, but rather all lines one by one
for words_in_line in line:
# If a given line contains 'start', append it. This is why you only get ['start\n'], it's the only line you are adding as no other line contains that string
if start_keyword in words_in_line:
lines.append(words_in_line)
答案 2 :(得分:1)
您需要一个状态变量来决定是否存储这些行。这是一个简单的示例,它将始终存储该行,然后将改变主意并将其丢弃以用于您不需要的情况:
start_keyword = 'start'
end_keyword = 'end'
lines = []
reading = False
with open('test.txt', 'r') as f:
for line in f:
lines.append(line)
if start_keyword in line:
reading = True
elif end_keyword in line:
reading = False
elif not reading:
lines.pop()
print ''.join(lines)
答案 3 :(得分:1)
如果文件不是太大(相对于您的计算机有多少RAM):
selected
然后,您可以使用start = 'start'
end = 'end'
with open('test.txt','r') as f:
content = f.read()
result = content[content.index(start):content.index(end)]
进行打印,使用print(result)
创建list
,依此类推。
如果有多个开始/停止点,和/或文件非常大:
result.split()
这会为您留下start = 'start'
end = 'end'
running = False
result = []
with open('test.txt','r') as f:
for line in f:
if start in line:
running = True
result.append(line)
elif end in line:
running = False
result.append(line)
elif running:
result.append(line)
,您可以list
,join()
,写入文件,等等。
答案 4 :(得分:0)
文件对象是它自己的迭代器,你不需要while循环来逐行读取文件,你可以迭代文件对象本身。要捕获这些部分,只要遇到start
的行,就会启动内部循环,并在点击end
时打破内部循环:
with open("in.txt") as f:
out = []
for line in f:
if start in line:
out.append(line)
for _line in f:
out.append(_line)
if end in _line:
break
输出:
['start\n', '4\n', '5\n', '6\n', 'end\n']