Question

我正在尝试将文件的各个部分读入numpy数组，这些数组的文件的不同部分具有类似的开始和停止标志。目前我找到了一个有效的方法，但在需要重新打开输入文件之前只对输入文件的一个部分。

目前我的代码是：

    with open("myFile.txt") as f:
        array = []
        parsing = False
        for line in f:
            if line.startswith('stop flag'):
            parsing = False
        if parsing:
            #do things to the data
        if line.startswith('start flag'):
            parsing = True

我找到了此question

的代码

使用此代码我需要重新打开并读取文件。

有没有办法阅读所有部分而无需为每个要阅读的部分打开文件？

Answer 1

每次到达开始标志时都可以使用itertools.take，直到停止：

from itertools import takewhile
with open("myFile.txt") as f:
        array = []
        for line in f:
            if line.startswith('start flag'):               
                data = takewhile(lambda x: not x.startswith("stop flag"),f)
                # use data and repeat

或者只使用内循环：

with open("myFile.txt") as f:
    array = []
    for line in f:
        if line.startswith('start flag'):
            # beginning of section use first lin
            for line in f:
                # check for end of section breaking if we find the stop lone
                if line.startswith("stop flag"):
                    break
                 # else process lines from section

文件对象返回自己的迭代器，因此当你遍历f时，指针将继续移动，当你到达开始标志时，开始处理一个部分，直到你达到停止。根本没有理由重新打开文件，只需在文件的行上迭代一次就可以使用这些部分。如果开始和停止标志行被认为是该部分的一部分，请确保也使用它们。

Answer 2

您有缩进问题，您的代码应如下所示：

with open("myFile.txt") as f:
    array = []
    parsing = False
    for line in f:
        if line.startswith('stop flag'):
        parsing = False
        if parsing:
        #do things to the data
        if line.startswith('start flag'):
        parsing = True

Answer 3

类似于你的解决方案是：

result = []
parse = False
with open("myFile.txt") as f:
    for line in f:
        if line.startswith('stop flag'):
            parse = False
        elif line.startswith('start flag'):
            parse = True
        elif parse:
            result.append(line)
        else:  # not needed, but I like to always add else clause
            continue
print result

但你也可以使用内循环或itertools.takewhile作为其他答案。对于真正重要的文件，特别是使用takewhile应该明显更快。

Answer 4

假设这是您的文件：

**starting** blabla blabla **starting** bleble bleble **starting** bumbum bumbum

这是该程序的代码：

file = open("testfile.txt", "r")
data = file.read()
file.close
data = data.split("**starting**")
print(data)

这是输出：

['', '\nblabla\nblabla\n', '\nbleble\nbleble\n', '\nbumbum\nbumbum']

稍后您可以del清空元素，或在data中执行其他操作。 split函数是string个对象的buildin，可以将更复杂的字符串作为参数。

在启动和停止标志之间读取多个文件块

4 个答案: