Question

我对Python完全陌生。我经常使用Perl并且听说Python在解析文本方面经常更好，所以我想尝试一下，但我无法找到最简单的方法（有关信息，我已经在Perl中做过，但它花了我几个，缓慢而丑陋的循环）：

我想读取一个大文件并提取两行之间的文本块，这些行以相同的模式开头，例如：

!NAME: "N0",                DESCR: "Netnt Etrnet"
!NAME: "cp0",              DESCR: "Cle R0"
!NAME: "slt R1",               DESCR: "RSt"
>>!NAME: "moe R1",             DESCR: "ASessor 1,bps"
>>!PID: A9-55
>>!VID: G0984981
>>!SN: SEDGH25443N51E
!NAME: "SDFGSDFG: FGT/0",       DESCR: "VFDFGX1"
!NAME: "JQFHF1",       DESCR: "VNQDF2"

当然＆＃34;＆gt;＆gt;＆＃34;不是文本文件的一部分，它只是为了显示我想要检索的行。

所以回顾一下：我想打印所有块（文件中有更多块），块的第一行以＃34;！NAME＆＃34;并在下一个＆＃34;！NAME＆＃34;之前有其他行。

我不在乎那里有两个＆＃34;！NAME：＆＃34;连续。

这只是第一步，稍后我将尝试检索此块的值以创建哈希（或字典或任何等效于python中的哈希）。但是我已经陷入了第一步，所以我要求帮助哈哈。

谢谢！

Answer 1

with open("in.txt") as f:
    prev = ""
    for line in f:
        if not line.startswith("!NAME:"):
            print(prev.rstrip())
            print(line.rstrip())
            for line in f:
                if line.startswith("!NAME:"):
                    prev = line
                    break
                print(line.rstrip())
                prev = line
        prev = line

如果您希望存储每个部分，可以使用dict：

from itertools import count

from collections import defaultdict
cn = count()

sections = defaultdict(str)
with open("log.txt") as f:  
    prev = ""
    for line in f:
        if not line.startswith("!NAME:"):
            key = next(cn)
            sections[key] += prev
            sections[key] += line
            for line in f:
                if line.startswith("!NAME:"):
                    break
                 sections[key] += line
                 prev = line
        prev = line

print(d)
defaultdict(<class 'str'>, {0: '!NAME: "moe R1",             DESCR: "ASessor 1,bps"\n!PID: A9-55\n!VID: G0984981\n!SN: SEDGH25443N51E\n'})

为了确保您只找到具有前一个！名称的部分，请确保上一行以！名称开始：

with open("log.txt") as f:
    prev = ""
    for line in f:
        if not line.startswith("!NAME:") and prev.startswith("!NAME:"):
            key = next(cn)
            sections[key] += prev
            sections[key] += line
            for line in f:
                if line.startswith("!NAME:"):
                    break
                sections[key] += line
                prev = line
        prev = line

Answer 2

或者，您可以使用itertools。

忽略文件中的所有内容，直到第一个!NAME
按行是否以!NAME开头
将第一对成为!NAME行的成对分组，第二对是下一个!NAME或EOF
在输出中包含!NAME行的最后一项，其后至少有一行不是!NAME

代码：

from itertools import groupby, izip_longest, dropwhile

with open('inputfile') as fin:
    stripped = (line.strip() for line in fin)
    start_at = dropwhile(lambda L: not L.startswith('!NAME'), stripped)
    grouped = (list(g) for k, g in groupby(start_at, lambda L: L.startswith('!NAME')))
    for name, rest in izip_longest(*iter([grouped] * 2), fillvalue=[]):
        if rest:
            print name[-1]
            print '\n'.join(rest)

给出：

!NAME: "moe R1",             DESCR: "ASessor 1,bps"
!PID: A9-55
!VID: G0984981
!SN: SEDGH25443N51E

在Python中两条相同的行之间获取文本

2 个答案: