Question

我有一个带有cli输出的文本文件，我需要从每条记录中获取某些信息，但文件格式不正确。我有正则表达式和数据：

我无法在捕获组中获取保证的值（文件/无）。非常感谢任何帮助

修改的

我需要获得音量，状态，Raid，Flex，root（如果可用）以及＆＃34;保证＆＃34;的值。字段（保证= XXX，我只需要XXX）在单独的捕获组中，在单一匹配中。其余数据对我的用例并没有多大用处。

Answer 1

我使用的正则表达式是：

(?=(?P<volume>\w+) (?P<state>o(?:n|ff)line)|(?P<options>(?:(?:root|\w+=\w+))+)|(?P<status>(?:(?:raid(?:_dp|4)|flex|cluster|64-bit))))[\w=]+

这使用占位符来允许轻松识别的分组（如果其中任何一个在错误的位置，那么基本上让我知道。）

使用正向前瞻，将获取音量和状态，然后查找选项，最后处理状态。似乎状态类型定义得很好，至少在这个数据中是这样。

这是我用来获取我认为你想要的输出的python：

import re

regex = r"(?=(?P<volume>\w+) (?P<state>o(?:n|ff)line)|(?P<options>(?:(?:root|\w+=\w+))+)|(?P<status>(?:(?:raid(?:_dp|4)|flex|cluster|64-bit))))[\w=]+"
test_str = ("2 entries were acted on.\n\n"
            "Node: abc-01\n"
            "         Volume State           Status                Options\n"
            "           vol0 online          raid_dp, flex         root, guarantee=file, nvfail=on, space_slo=none\n"
            "                                64-bit\n\n"
            "Node: abc-02\n"
            "         Volume State           Status                Options\n"
            "           vol0 online          raid_dp, flex         root, nvfail=on, space_slo=none, guarantee=none\n"
            "                                64-bit\n"
            " asdfbw017_5_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            " asdfbw018_2_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none, guarantee=none\n"
            "werwr_1_WINDOWS_1 online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            "werwr_2_RHEL_2_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            "werwr_2_RHEL_1_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none, guarantee=none\n"
            "werwr_1_WINDOWS_2 online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            "  werwr_1_ESX_2 online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            " asdfbw018_1_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            "  werwr_1_ESX_4 online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            "werwr_1_W2K8_01 online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            " asdfbw017_2_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none")

matches = re.finditer(regex, test_str, re.MULTILINE)

allData = {}
currentVolume = ""

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1
    volume = match.groupdict()['volume']
    state = match.groupdict()['state']
    options = match.groupdict()['options']
    status = match.groupdict()['status']

    if volume is not None:
        currentVolume = volume
        allData[volume] = {'state': [], 'options': [], 'status': []}
    if status is not None:
        allData[currentVolume]['status'].append(status)
    if state is not None:
        allData[currentVolume]['state'].append(state)
    if options is not None:
        allData[currentVolume]['options'].append(options)

print(allData)

一个卷的样本输出：

{'werwr_1_ESX_2': {'options': ['nvfail=on', 'create_ucode=on', 'convert_ucode=on', 'schedsnapname=create_time', 'fractional_reserve=0', 'space_slo=none'], 'status': ['raid4', 'flex', 'cluster', '64-bit'], 'state': ['online']}

查看here

修改

问题编辑完成后，我设法得到了以下信息：

(?P<volume>\w+)\s+(?P<state>o(?:n|ff)line)\s+(?:(?P<raid>raid\w+),\s+)?(?P<flex>flex)?(?:.*?(?:(?P<root>root)|guarantee=(?P<guarantee>\w+)))*

查看here

它仍然使用分组（特别是对于根|保证部分）而不是假设它们的顺序，但现在应该在一次匹配中做你想要的。

如何将特定字符串作为具有未指定位置的捕获组进行匹配

1 个答案: