如何在一行中找到一个子字符串并从该行追加到下一个子字符串?

时间:2015-07-23 04:20:45

标签: python file substring line

test.txt将是

1
2
3
start
4
5
6
end
7
8
9

我希望结果是

start
4
5
6
end

这是我的代码

file = open('test.txt','r')

line = file.readline()

start_keyword = 'start'
end_keyword = 'end'

lines = []

while line: 
    line = file.readlines() 
    for words_in_line in line: 
        if start_keyword in words_in_line:
            lines.append(words_in_line)

file.close()

print entities

返回

['start\n']

我不知道在上面的代码中添加什么来实现我想要的结果。我一直在搜索和更改代码,但我不知道如何让它按照我想要的方式工作。

5 个答案:

答案 0 :(得分:1)

当遇到start_keyword时,您可以使用某种设置为true的标志,如果设置了该标志,则将行添加到lines列表,并在{{end_keyword时取消设置遇到1}}(但只有在将end_keyword写入lines列表后才会这样。

同时使用.strip()上的words_in_line删除\n(以及其他尾随和前导空格)如果您不希望它们列在lines列表中,如果您这样做想要它们,然后不要剥掉它。

示例 -

flag = False
for words_in_line in line: 
    if start_keyword in words_in_line:
        flag = True
    if flag:
        lines.append(words_in_line.strip())
    if end_keyword in words_in_line:
        flag = False

请注意,这会在start列表中添加多个endlines块,我猜这就是您想要的。

答案 1 :(得分:1)

使用旗帜。试试这个:

file = open('test.txt','r')

start_keyword = 'start'
end_keyword = 'end'
in_range = False
entities = []

lines = file.readlines()

for line in lines:

    line = line.strip()

    if line == start_keyword:
        in_range = True
    elif line == end_keyword:
        in_range = False

    elif in_range:
        entities.append(line)

file.close()

# If you want to include the start/end tags
#entities = [start_keyword] + entities + [end_keyword]

print entities

关于您的代码,请注意readlines已经读取了文件中的所有行,因此调用readline似乎没有多大意义,除非您忽略第一行。还可以使用strip从字符串中删除EOL字符。请注意您的代码没有按照您的预期执行:

# Reads ALL lines in the file as an array
line = file.readlines() 

# You are not iterating words in a line, but rather all lines one by one
for words_in_line in line:

    # If a given line contains 'start', append it. This is why you only get ['start\n'], it's the only line you are adding as no other line contains that string
    if start_keyword in words_in_line:
        lines.append(words_in_line)

答案 2 :(得分:1)

您需要一个状态变量来决定是否存储这些行。这是一个简单的示例,它将始终存储该行,然后将改变主意并将其丢弃以用于您不需要的情况:

start_keyword = 'start'
end_keyword = 'end'

lines = []
reading = False
with open('test.txt', 'r') as f:
    for line in f:
        lines.append(line)
        if start_keyword in line:
            reading = True
        elif end_keyword in line:
            reading = False
        elif not reading:
            lines.pop()

print ''.join(lines)

答案 3 :(得分:1)

如果文件不是太大(相对于您的计算机有多少RAM):

selected

然后,您可以使用start = 'start' end = 'end' with open('test.txt','r') as f: content = f.read() result = content[content.index(start):content.index(end)] 进行打印,使用print(result)创建list,依此类推。

如果有多个开始/停止点,和/或文件非常大:

result.split()

这会为您留下start = 'start' end = 'end' running = False result = [] with open('test.txt','r') as f: for line in f: if start in line: running = True result.append(line) elif end in line: running = False result.append(line) elif running: result.append(line) ,您可以listjoin(),写入文件,等等。

答案 4 :(得分:0)

文件对象是它自己的迭代器,你不需要while循环来逐行读取文件,你可以迭代文件对象本身。要捕获这些部分,只要遇到start的行,就会启动内部循环,并在点击end时打破内部循环:

with open("in.txt") as f:
    out = []
    for line in f:
        if start in line:
            out.append(line)
            for _line in f:
                out.append(_line)
                if end in  _line:
                    break 

输出:

['start\n', '4\n', '5\n', '6\n', 'end\n']