Question

test.txt将是

1
2
3
start
4
5
6
end
7
8
9

我希望结果是

start
4
5
6
end

这是我的代码

file = open('test.txt','r')

line = file.readline()

start_keyword = 'start'
end_keyword = 'end'

lines = []

while line: 
    line = file.readlines() 
    for words_in_line in line: 
        if start_keyword in words_in_line:
            lines.append(words_in_line)

file.close()

print entities

返回

['start\n']

我不知道在上面的代码中添加什么来实现我想要的结果。我一直在搜索和更改代码，但我不知道如何让它按照我想要的方式工作。

Answer 1

当遇到start_keyword时，您可以使用某种设置为true的标志，如果设置了该标志，则将行添加到lines列表，并在{{end_keyword时取消设置遇到1}}（但只有在将end_keyword写入lines列表后才会这样。

同时使用.strip()上的words_in_line删除\n（以及其他尾随和前导空格）如果您不希望它们列在lines列表中，如果您这样做想要它们，然后不要剥掉它。

示例 -

flag = False
for words_in_line in line: 
    if start_keyword in words_in_line:
        flag = True
    if flag:
        lines.append(words_in_line.strip())
    if end_keyword in words_in_line:
        flag = False

请注意，这会在start列表中添加多个end到lines块，我猜这就是您想要的。

Answer 2

使用旗帜。试试这个：

file = open('test.txt','r')

start_keyword = 'start'
end_keyword = 'end'
in_range = False
entities = []

lines = file.readlines()

for line in lines:

    line = line.strip()

    if line == start_keyword:
        in_range = True
    elif line == end_keyword:
        in_range = False

    elif in_range:
        entities.append(line)

file.close()

# If you want to include the start/end tags
#entities = [start_keyword] + entities + [end_keyword]

print entities

关于您的代码，请注意readlines已经读取了文件中的所有行，因此调用readline似乎没有多大意义，除非您忽略第一行。还可以使用strip从字符串中删除EOL字符。请注意您的代码没有按照您的预期执行：

# Reads ALL lines in the file as an array
line = file.readlines() 

# You are not iterating words in a line, but rather all lines one by one
for words_in_line in line:

    # If a given line contains 'start', append it. This is why you only get ['start\n'], it's the only line you are adding as no other line contains that string
    if start_keyword in words_in_line:
        lines.append(words_in_line)

Answer 3

您需要一个状态变量来决定是否存储这些行。这是一个简单的示例，它将始终存储该行，然后将改变主意并将其丢弃以用于您不需要的情况：

start_keyword = 'start'
end_keyword = 'end'

lines = []
reading = False
with open('test.txt', 'r') as f:
    for line in f:
        lines.append(line)
        if start_keyword in line:
            reading = True
        elif end_keyword in line:
            reading = False
        elif not reading:
            lines.pop()

print ''.join(lines)

Answer 4

如果文件不是太大（相对于您的计算机有多少RAM）：

selected

然后，您可以使用start = 'start' end = 'end' with open('test.txt','r') as f: content = f.read() result = content[content.index(start):content.index(end)]进行打印，使用print(result)创建list，依此类推。

如果有多个开始/停止点，和/或文件非常大：

result.split()

这会为您留下start = 'start' end = 'end' running = False result = [] with open('test.txt','r') as f: for line in f: if start in line: running = True result.append(line) elif end in line: running = False result.append(line) elif running: result.append(line)，您可以list，join()，写入文件，等等。

Answer 5

文件对象是它自己的迭代器，你不需要while循环来逐行读取文件，你可以迭代文件对象本身。要捕获这些部分，只要遇到start的行，就会启动内部循环，并在点击end时打破内部循环：

with open("in.txt") as f:
    out = []
    for line in f:
        if start in line:
            out.append(line)
            for _line in f:
                out.append(_line)
                if end in  _line:
                    break

输出：

['start\n', '4\n', '5\n', '6\n', 'end\n']

如何在一行中找到一个子字符串并从该行追加到下一个子字符串？

5 个答案: