我正在尝试制作迭代通过.txt文件的Python脚本。这些文件通常为600-800行,其格式如下:
==========
ID: 10001 Found:(4)
==========
MSG: ERR_ID - ***ERROR*** _errortexthere_
==========
ID: 10002 Found:(26)
==========
MSG: ERR_ID - ***ERROR*** _errortexthere_
line2
line3
line4
line5
==========
ID: 10003 Found:(15039)
==========
MSG: ERR_ID - ***ERROR*** _errortexthere_
etc1
etc2
etc3
基本上,我想从'ID:'读取'ID:'并将它们之间的所有文本存储在一个字符串(或数组,字典,你有什么)中。问题是,'ID:'之间的线数变化很大,因此按行号管理它们不会有太大帮助。我对Python很陌生,并且不像其他语言那样熟悉基本语法。我已经对SO进行了大量的搜索,发现了许多类似或接近我需要的问题,但并不准确。任何帮助将不胜感激。
答案 0 :(得分:0)
您应该逐行阅读并检查该行中的第一个元素是否为ID
f = open('workfile', 'r')
for line in f:
arr = line.split(" ")
if(arr[0] == "ID:"):
# do what you need too
答案 1 :(得分:0)
这是一个非常简单的实现,只检测以完全字符串“ID:
”开头的行。它会忽略与==========
完全匹配的空白行和行。
它将每个ID:
后面的行保存到字典中,该字典的键是ID字符串。
from io import BytesIO
from pprint import pprint
infile = BytesIO("""
==========
ID: 10001 Found:(4)
==========
MSG: ERR_ID - ***ERROR*** _errortexthere_
==========
ID: 10002 Found:(26)
==========
MSG: ERR_ID - ***ERROR*** _errortexthere_
line2
line3
line4
line5
""")
buffer = ""
d = {}
id = None
for line in infile:
if line.rstrip() in ("==========",""):
# skip blank lines or delimiting lines
pass
elif line.startswith("ID: "):
# save the buffer we've been collecting to the dictionary...
if id is not None:
d[id] = buffer
# ... and start collecting new lines
id = line.split()[1]
buffer = ""
else:
buffer += line
else:
# save whatever lines are leftover after the last `ID:`
if id is not None:
d[id] = buffer
pprint(d)
输出:
{'10001': 'MSG: ERR_ID - ***ERROR*** _errortexthere_\n',
'10002': 'MSG: ERR_ID - ***ERROR*** _errortexthere_\nline2\nline3\nline4\nline5\n'}