如何将特定字符串作为具有未指定位置的捕获组进行匹配

时间:2018-01-31 12:02:05

标签: regex

我有一个带有cli输出的文本文件,我需要从每条记录中获取某些信息,但文件格式不正确。我有正则表达式和数据:

https://regex101.com/r/1T3icV/2

我无法在捕获组中获取保证的值(文件/无)。 非常感谢任何帮助

修改

我需要获得音量,状态,Raid,Flex,root(如果可用)以及"保证"的值。字段(保证= XXX,我只需要XXX)在单独的捕获组中,在单一匹配中。其余数据对我的用例并没有多大用处。

1 个答案:

答案 0 :(得分:1)

我使用的正则表达式是:

(?=(?P<volume>\w+) (?P<state>o(?:n|ff)line)|(?P<options>(?:(?:root|\w+=\w+))+)|(?P<status>(?:(?:raid(?:_dp|4)|flex|cluster|64-bit))))[\w=]+

这使用占位符来允许轻松识别的分组(如果其中任何一个在错误的位置,那么基本上让我知道。)

使用正向前瞻,将获取音量和状态,然后查找选项,最后处理状态。似乎状态类型定义得很好,至少在这个数据中是这样。

这是我用来获取我认为你想要的输出的python:

import re

regex = r"(?=(?P<volume>\w+) (?P<state>o(?:n|ff)line)|(?P<options>(?:(?:root|\w+=\w+))+)|(?P<status>(?:(?:raid(?:_dp|4)|flex|cluster|64-bit))))[\w=]+"
test_str = ("2 entries were acted on.\n\n"
            "Node: abc-01\n"
            "         Volume State           Status                Options\n"
            "           vol0 online          raid_dp, flex         root, guarantee=file, nvfail=on, space_slo=none\n"
            "                                64-bit\n\n"
            "Node: abc-02\n"
            "         Volume State           Status                Options\n"
            "           vol0 online          raid_dp, flex         root, nvfail=on, space_slo=none, guarantee=none\n"
            "                                64-bit\n"
            " asdfbw017_5_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            " asdfbw018_2_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none, guarantee=none\n"
            "werwr_1_WINDOWS_1 online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            "werwr_2_RHEL_2_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            "werwr_2_RHEL_1_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none, guarantee=none\n"
            "werwr_1_WINDOWS_2 online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            "  werwr_1_ESX_2 online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            " asdfbw018_1_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            "  werwr_1_ESX_4 online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            "werwr_1_W2K8_01 online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none\n"
            " asdfbw017_2_vol online          raid4, flex           nvfail=on, create_ucode=on, convert_ucode=on,\n"
            "                                cluster               schedsnapname=create_time, fractional_reserve=0,\n"
            "                                64-bit                space_slo=none")

matches = re.finditer(regex, test_str, re.MULTILINE)

allData = {}
currentVolume = ""

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1
    volume = match.groupdict()['volume']
    state = match.groupdict()['state']
    options = match.groupdict()['options']
    status = match.groupdict()['status']

    if volume is not None:
        currentVolume = volume
        allData[volume] = {'state': [], 'options': [], 'status': []}
    if status is not None:
        allData[currentVolume]['status'].append(status)
    if state is not None:
        allData[currentVolume]['state'].append(state)
    if options is not None:
        allData[currentVolume]['options'].append(options)

print(allData)

一个卷的样本输出:

{'werwr_1_ESX_2': {'options': ['nvfail=on', 'create_ucode=on', 'convert_ucode=on', 'schedsnapname=create_time', 'fractional_reserve=0', 'space_slo=none'], 'status': ['raid4', 'flex', 'cluster', '64-bit'], 'state': ['online']}

查看here

修改

问题编辑完成后,我设法得到了以下信息:

(?P<volume>\w+)\s+(?P<state>o(?:n|ff)line)\s+(?:(?P<raid>raid\w+),\s+)?(?P<flex>flex)?(?:.*?(?:(?P<root>root)|guarantee=(?P<guarantee>\w+)))*

查看here

它仍然使用分组(特别是对于根|保证部分)而不是假设它们的顺序,但现在应该在一次匹配中做你想要的。