迭代Python中的.txt文件并分成字符串

时间:2014-09-09 17:04:15

标签: python io

我正在尝试制作迭代通过.txt文件的Python脚本。这些文件通常为600-800行,其格式如下:

==========
ID: 10001      Found:(4)
==========
MSG: ERR_ID  - ***ERROR*** _errortexthere_

==========
ID: 10002      Found:(26)
==========
MSG: ERR_ID  - ***ERROR*** _errortexthere_
line2
line3
line4
line5

==========
ID: 10003      Found:(15039)
==========
MSG: ERR_ID  - ***ERROR*** _errortexthere_
etc1
etc2
etc3

基本上,我想从'ID:'读取'ID:'并将它们之间的所有文本存储在一个字符串(或数组,字典,你有什么)中。问题是,'ID:'之间的线数变化很大,因此按行号管理它们不会有太大帮助。我对Python很陌生,并且不像其他语言那样熟悉基本语法。我已经对SO进行了大量的搜索,发现了许多类似或接近我需要的问题,但并不准确。任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:0)

您应该逐行阅读并检查该行中的第一个元素是否为ID

f = open('workfile', 'r')
for line in f:
    arr = line.split(" ")
    if(arr[0] == "ID:"):
       # do what you need too 

答案 1 :(得分:0)

这是一个非常简单的实现,只检测以完全字符串ID:”开头的行。它会忽略与==========完全匹配的空白行和行。

它将每个ID:后面的行保存到字典中,该字典的键是ID字符串。

from io import BytesIO
from pprint import pprint

infile = BytesIO("""
==========
ID: 10001      Found:(4)
==========
MSG: ERR_ID  - ***ERROR*** _errortexthere_

==========
ID: 10002      Found:(26)
==========
MSG: ERR_ID  - ***ERROR*** _errortexthere_
line2
line3
line4
line5
""")


buffer = ""
d = {}
id = None

for line in infile:
    if line.rstrip() in ("==========",""):
        # skip blank lines or delimiting lines
        pass
    elif line.startswith("ID: "):
        # save the buffer we've been collecting to the dictionary...
        if id is not None:        
            d[id] = buffer

        # ... and start collecting new lines
        id = line.split()[1]
        buffer = ""
    else:
        buffer += line
else:
    # save whatever lines are leftover after the last `ID:`
    if id is not None:
        d[id] = buffer

pprint(d)

输出:

{'10001': 'MSG: ERR_ID  - ***ERROR*** _errortexthere_\n',
 '10002': 'MSG: ERR_ID  - ***ERROR*** _errortexthere_\nline2\nline3\nline4\nline5\n'}