Question

我正在尝试制作迭代通过.txt文件的Python脚本。这些文件通常为600-800行，其格式如下：

==========
ID: 10001      Found:(4)
==========
MSG: ERR_ID  - ***ERROR*** _errortexthere_

==========
ID: 10002      Found:(26)
==========
MSG: ERR_ID  - ***ERROR*** _errortexthere_
line2
line3
line4
line5

==========
ID: 10003      Found:(15039)
==========
MSG: ERR_ID  - ***ERROR*** _errortexthere_
etc1
etc2
etc3

基本上，我想从'ID：'读取'ID：'并将它们之间的所有文本存储在一个字符串（或数组，字典，你有什么）中。问题是，'ID：'之间的线数变化很大，因此按行号管理它们不会有太大帮助。我对Python很陌生，并且不像其他语言那样熟悉基本语法。我已经对SO进行了大量的搜索，发现了许多类似或接近我需要的问题，但并不准确。任何帮助将不胜感激。

Answer 1

您应该逐行阅读并检查该行中的第一个元素是否为ID

f = open('workfile', 'r')
for line in f:
    arr = line.split(" ")
    if(arr[0] == "ID:"):
       # do what you need too

Answer 2

这是一个非常简单的实现，只检测以完全字符串“ID:”开头的行。它会忽略与==========完全匹配的空白行和行。

它将每个ID:后面的行保存到字典中，该字典的键是ID字符串。

from io import BytesIO
from pprint import pprint

infile = BytesIO("""
==========
ID: 10001      Found:(4)
==========
MSG: ERR_ID  - ***ERROR*** _errortexthere_

==========
ID: 10002      Found:(26)
==========
MSG: ERR_ID  - ***ERROR*** _errortexthere_
line2
line3
line4
line5
""")


buffer = ""
d = {}
id = None

for line in infile:
    if line.rstrip() in ("==========",""):
        # skip blank lines or delimiting lines
        pass
    elif line.startswith("ID: "):
        # save the buffer we've been collecting to the dictionary...
        if id is not None:        
            d[id] = buffer

        # ... and start collecting new lines
        id = line.split()[1]
        buffer = ""
    else:
        buffer += line
else:
    # save whatever lines are leftover after the last `ID:`
    if id is not None:
        d[id] = buffer

pprint(d)

输出：

{'10001': 'MSG: ERR_ID  - ***ERROR*** _errortexthere_\n',
 '10002': 'MSG: ERR_ID  - ***ERROR*** _errortexthere_\nline2\nline3\nline4\nline5\n'}

迭代Python中的.txt文件并分成字符串

2 个答案: