Question

需要小帮助。我有一个大文件，主要包含来自屏幕截图的线条。每个块之间都有空行，以字符串'zone'开头。

我如何形成字典，以便test_1，test_2和test_3是我的键，我的值是直到空行的所有行？

第一个键的示例

key = test_1

values = ['* fcid 0x170024 [pnn 10：00：00：00：c9：5f：84：93] [xxxx]'，'* fcid 0x170016 [pwwn 50：06：0e：80：16：60 ：ef：43] [xxxxxxxxx]']

有什么好的提示吗？提前致谢

以下是文字：

zone name test_1 vsan xx

* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]
* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]

zone name test_2 vsan xx

* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]
* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]

zone name test_2 vsan yy

pwwn 10:00:00:90:fa:81:bb:f2
* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]

Answer 1

您可以使用带有正则表达式的正向前瞻断言。

尝试(?m)^zone\sname\s(\w+).*([\s\S]*?)(?=[\n\r]+zone\sname\s|\Z)

在行动中

import re

pattern = r'(?m)^zone\sname\s(\w+).*([\s\S]*?)(?=[\n\r]+zone\sname\s|\Z)'

with open('test.txt') as f:
  data = {k: [
    i for i in v.split('\n') if i
    ] for k, v in dict(re.findall(pattern, f.read())).items()}

  print(data)

结果：

{'test_1': ['* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]', '* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]'], 'test_2': ['* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]', '* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]'], 'test_3': ['pwwn 10:00:00:90:fa:81:bb:f2', '* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]']}

正则表达式解释：

(?m)                             # asserts multiline matching
^zone\sname\s                    # matches zone name at start of line
(                                # matching group 1
  \w+                            # Matches your key
)                                
.*                               # Matches any character but new line
(                                # Matching group 2
  [\s\S]*?                       # Matches  until...
)
(?=                              # ... this group is found
  [\n\r]+
  zone\sname                     # same as first match
  \s
  |                              # or
  \Z                             # end of string
)

Answer 2

您的屏幕截图与文字示例不同：不同的换行符和重复的test_2。因此，猜测你的意图：

txt = """
zone name test_1 vsan xx
* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]
* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]

zone name test_2 vsan xx
* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]
* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]

zone name test_2 vsan yy
pwwn 10:00:00:90:fa:81:bb:f2
* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]
"""

result = dict()
for block in txt.split("\n\n"):
    lines = block.strip().split("\n")
    k = lines[0].split(" ")[2]
    result.update({k: list(filter(lambda l: l.startswith("*"), block.split("\n")))})

print(result)

如果应该包含不以星号开头的行，那么：

result = dict()
for block in txt.split("\n\n"):
    lines = block.strip().split("\n")
    k = lines[0].split(" ")[2]
    result.update({k: lines[1:]})

Answer 3

这是使用groupby的另一种解决方案：

from itertools import groupby
import io

txt = """
zone name test_1 vsan xx

* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xxx]
* fcid 0x170016 [pwwn 50:06:0e:80:16:60:ef:43] [xxxxxx]

zone name test_2 vsan xx

* fcid 0x170024 [pwwn 10:00:00:00:c9:5f:84:93] [xx]
* fcid 0x170017 [pwwn 50:06:0e:80:16:60:ef:63] [xxx]

zone name test_3 vsan yy

pwwn 10:00:00:90:fa:81:bb:f2
* fcid 0x0b00c0 [pwwn 50:06:0e:80:07:e6:2e:26]
"""

BLANK, KEY, DATA = 0, 1, 2

def decide_line(line):
    if (not line) or line.isspace():
        return BLANK
    else:
        return KEY if line.startswith('zone') else DATA

data_map = {}

with io.StringIO(txt) as f:
#with with open('file.txt') as f:
    key = None
    for data_type, lines in groupby(map(str.strip, f), decide_line):
        if data_type == KEY:
            key = next(lines).split()[2]
        elif data_type == DATA:
            data_map[key] = list(lines)

python如何从这个文件形成dict

3 个答案: