python如何从这个文件形成dict

时间:2018-05-29 20:52:59

标签: python

需要小帮助。我有一个大文件,主要包含来自屏幕截图的线条。 每个块之间都有空行,以字符串'zone'开头。

我如何形成字典,以便test_1,test_2和test_3是我的键,我的值是直到空行的所有行?

第一个键的示例

key = test_1

values = ['* fcid 0x170024 [pnn 10:00:00:00:c9:5f:84:93] [xxxx]','* fcid 0x170016 [pwwn 50:06:0e:80:16:60 :ef:43] [xxxxxxxxx]']

有什么好的提示吗? 提前致谢

enter image description here

以下是文字:

zone name test_1 vsan xx

* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]
* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]

zone name test_2 vsan xx

* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]
* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]

zone name test_2 vsan yy

pwwn 10:00:00:90:fa:81:bb:f2
* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]

3 个答案:

答案 0 :(得分:2)

您可以使用带有正则表达式的正向前瞻断言。

尝试(?m)^zone\sname\s(\w+).*([\s\S]*?)(?=[\n\r]+zone\sname\s|\Z)

在行动中

import re

pattern = r'(?m)^zone\sname\s(\w+).*([\s\S]*?)(?=[\n\r]+zone\sname\s|\Z)'

with open('test.txt') as f:
  data = {k: [
    i for i in v.split('\n') if i
    ] for k, v in dict(re.findall(pattern, f.read())).items()}

  print(data)

结果:

{'test_1': ['* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]', '* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]'], 'test_2': ['* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]', '* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]'], 'test_3': ['pwwn 10:00:00:90:fa:81:bb:f2', '* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]']}

正则表达式解释:

(?m)                             # asserts multiline matching
^zone\sname\s                    # matches zone name at start of line
(                                # matching group 1
  \w+                            # Matches your key
)                                
.*                               # Matches any character but new line
(                                # Matching group 2
  [\s\S]*?                       # Matches  until...
)
(?=                              # ... this group is found
  [\n\r]+
  zone\sname                     # same as first match
  \s
  |                              # or
  \Z                             # end of string
)

答案 1 :(得分:1)

您的屏幕截图与文字示例不同:不同的换行符和重复的test_2。因此,猜测你的意图:

txt = """
zone name test_1 vsan xx
* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]
* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]

zone name test_2 vsan xx
* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]
* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]

zone name test_2 vsan yy
pwwn 10:00:00:90:fa:81:bb:f2
* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]
"""

result = dict()
for block in txt.split("\n\n"):
    lines = block.strip().split("\n")
    k = lines[0].split(" ")[2]
    result.update({k: list(filter(lambda l: l.startswith("*"), block.split("\n")))})

print(result)

如果应该包含不以星号开头的行,那么:

result = dict()
for block in txt.split("\n\n"):
    lines = block.strip().split("\n")
    k = lines[0].split(" ")[2]
    result.update({k: lines[1:]})

答案 2 :(得分:0)

这是使用groupby的另一种解决方案:

from itertools import groupby
import io

txt = """
zone name test_1 vsan xx

* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]
* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]

zone name test_2 vsan xx

* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]
* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]

zone name test_3 vsan yy

pwwn 10:00:00:90:fa:81:bb:f2
* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]
"""

BLANK, KEY, DATA = 0, 1, 2

def decide_line(line):
    if (not line) or line.isspace():
        return BLANK
    else:
        return KEY if line.startswith('zone') else DATA

data_map = {}

with io.StringIO(txt) as f:
#with with open('file.txt') as f:
    key = None
    for data_type, lines in groupby(map(str.strip, f), decide_line):
        if data_type == KEY:
            key = next(lines).split()[2]
        elif data_type == DATA:
            data_map[key] = list(lines)