需要小帮助。我有一个大文件,主要包含来自屏幕截图的线条。 每个块之间都有空行,以字符串'zone'开头。
我如何形成字典,以便test_1,test_2和test_3是我的键,我的值是直到空行的所有行?
第一个键的示例
key = test_1
values = ['* fcid 0x170024 [pnn 10:00:00:00:c9:5f:84:93] [xxxx]','* fcid 0x170016 [pwwn 50:06:0e:80:16:60 :ef:43] [xxxxxxxxx]']
有什么好的提示吗? 提前致谢
以下是文字:
zone name test_1 vsan xx
* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]
* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]
zone name test_2 vsan xx
* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]
* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]
zone name test_2 vsan yy
pwwn 10:00:00:90:fa:81:bb:f2
* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]
答案 0 :(得分:2)
您可以使用带有正则表达式的正向前瞻断言。
尝试(?m)^zone\sname\s(\w+).*([\s\S]*?)(?=[\n\r]+zone\sname\s|\Z)
在行动中
import re
pattern = r'(?m)^zone\sname\s(\w+).*([\s\S]*?)(?=[\n\r]+zone\sname\s|\Z)'
with open('test.txt') as f:
data = {k: [
i for i in v.split('\n') if i
] for k, v in dict(re.findall(pattern, f.read())).items()}
print(data)
结果:
{'test_1': ['* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]', '* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]'], 'test_2': ['* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]', '* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]'], 'test_3': ['pwwn 10:00:00:90:fa:81:bb:f2', '* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]']}
正则表达式解释:
(?m) # asserts multiline matching
^zone\sname\s # matches zone name at start of line
( # matching group 1
\w+ # Matches your key
)
.* # Matches any character but new line
( # Matching group 2
[\s\S]*? # Matches until...
)
(?= # ... this group is found
[\n\r]+
zone\sname # same as first match
\s
| # or
\Z # end of string
)
答案 1 :(得分:1)
您的屏幕截图与文字示例不同:不同的换行符和重复的test_2。因此,猜测你的意图:
txt = """
zone name test_1 vsan xx
* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]
* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]
zone name test_2 vsan xx
* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]
* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]
zone name test_2 vsan yy
pwwn 10:00:00:90:fa:81:bb:f2
* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]
"""
result = dict()
for block in txt.split("\n\n"):
lines = block.strip().split("\n")
k = lines[0].split(" ")[2]
result.update({k: list(filter(lambda l: l.startswith("*"), block.split("\n")))})
print(result)
如果应该包含不以星号开头的行,那么:
result = dict()
for block in txt.split("\n\n"):
lines = block.strip().split("\n")
k = lines[0].split(" ")[2]
result.update({k: lines[1:]})
答案 2 :(得分:0)
这是使用groupby的另一种解决方案:
from itertools import groupby
import io
txt = """
zone name test_1 vsan xx
* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]
* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]
zone name test_2 vsan xx
* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]
* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]
zone name test_3 vsan yy
pwwn 10:00:00:90:fa:81:bb:f2
* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]
"""
BLANK, KEY, DATA = 0, 1, 2
def decide_line(line):
if (not line) or line.isspace():
return BLANK
else:
return KEY if line.startswith('zone') else DATA
data_map = {}
with io.StringIO(txt) as f:
#with with open('file.txt') as f:
key = None
for data_type, lines in groupby(map(str.strip, f), decide_line):
if data_type == KEY:
key = next(lines).split()[2]
elif data_type == DATA:
data_map[key] = list(lines)