我有一个带有cli输出的文本文件,我需要从每条记录中获取某些信息,但文件格式不正确。我有正则表达式和数据:
https://regex101.com/r/1T3icV/2
我无法在捕获组中获取保证的值(文件/无)。 非常感谢任何帮助
修改的
我需要获得音量,状态,Raid,Flex,root(如果可用)以及"保证"的值。字段(保证= XXX,我只需要XXX)在单独的捕获组中,在单一匹配中。其余数据对我的用例并没有多大用处。
答案 0 :(得分:1)
我使用的正则表达式是:
(?=(?P<volume>\w+) (?P<state>o(?:n|ff)line)|(?P<options>(?:(?:root|\w+=\w+))+)|(?P<status>(?:(?:raid(?:_dp|4)|flex|cluster|64-bit))))[\w=]+
这使用占位符来允许轻松识别的分组(如果其中任何一个在错误的位置,那么基本上让我知道。)
使用正向前瞻,将获取音量和状态,然后查找选项,最后处理状态。似乎状态类型定义得很好,至少在这个数据中是这样。
这是我用来获取我认为你想要的输出的python:
import re
regex = r"(?=(?P<volume>\w+) (?P<state>o(?:n|ff)line)|(?P<options>(?:(?:root|\w+=\w+))+)|(?P<status>(?:(?:raid(?:_dp|4)|flex|cluster|64-bit))))[\w=]+"
test_str = ("2 entries were acted on.\n\n"
"Node: abc-01\n"
" Volume State Status Options\n"
" vol0 online raid_dp, flex root, guarantee=file, nvfail=on, space_slo=none\n"
" 64-bit\n\n"
"Node: abc-02\n"
" Volume State Status Options\n"
" vol0 online raid_dp, flex root, nvfail=on, space_slo=none, guarantee=none\n"
" 64-bit\n"
" asdfbw017_5_vol online raid4, flex nvfail=on, create_ucode=on, convert_ucode=on,\n"
" cluster schedsnapname=create_time, fractional_reserve=0,\n"
" 64-bit space_slo=none\n"
" asdfbw018_2_vol online raid4, flex nvfail=on, create_ucode=on, convert_ucode=on,\n"
" cluster schedsnapname=create_time, fractional_reserve=0,\n"
" 64-bit space_slo=none, guarantee=none\n"
"werwr_1_WINDOWS_1 online raid4, flex nvfail=on, create_ucode=on, convert_ucode=on,\n"
" cluster schedsnapname=create_time, fractional_reserve=0,\n"
" 64-bit space_slo=none\n"
"werwr_2_RHEL_2_vol online raid4, flex nvfail=on, create_ucode=on, convert_ucode=on,\n"
" cluster schedsnapname=create_time, fractional_reserve=0,\n"
" 64-bit space_slo=none\n"
"werwr_2_RHEL_1_vol online raid4, flex nvfail=on, create_ucode=on, convert_ucode=on,\n"
" cluster schedsnapname=create_time, fractional_reserve=0,\n"
" 64-bit space_slo=none, guarantee=none\n"
"werwr_1_WINDOWS_2 online raid4, flex nvfail=on, create_ucode=on, convert_ucode=on,\n"
" cluster schedsnapname=create_time, fractional_reserve=0,\n"
" 64-bit space_slo=none\n"
" werwr_1_ESX_2 online raid4, flex nvfail=on, create_ucode=on, convert_ucode=on,\n"
" cluster schedsnapname=create_time, fractional_reserve=0,\n"
" 64-bit space_slo=none\n"
" asdfbw018_1_vol online raid4, flex nvfail=on, create_ucode=on, convert_ucode=on,\n"
" cluster schedsnapname=create_time, fractional_reserve=0,\n"
" 64-bit space_slo=none\n"
" werwr_1_ESX_4 online raid4, flex nvfail=on, create_ucode=on, convert_ucode=on,\n"
" cluster schedsnapname=create_time, fractional_reserve=0,\n"
" 64-bit space_slo=none\n"
"werwr_1_W2K8_01 online raid4, flex nvfail=on, create_ucode=on, convert_ucode=on,\n"
" cluster schedsnapname=create_time, fractional_reserve=0,\n"
" 64-bit space_slo=none\n"
" asdfbw017_2_vol online raid4, flex nvfail=on, create_ucode=on, convert_ucode=on,\n"
" cluster schedsnapname=create_time, fractional_reserve=0,\n"
" 64-bit space_slo=none")
matches = re.finditer(regex, test_str, re.MULTILINE)
allData = {}
currentVolume = ""
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
volume = match.groupdict()['volume']
state = match.groupdict()['state']
options = match.groupdict()['options']
status = match.groupdict()['status']
if volume is not None:
currentVolume = volume
allData[volume] = {'state': [], 'options': [], 'status': []}
if status is not None:
allData[currentVolume]['status'].append(status)
if state is not None:
allData[currentVolume]['state'].append(state)
if options is not None:
allData[currentVolume]['options'].append(options)
print(allData)
一个卷的样本输出:
{'werwr_1_ESX_2': {'options': ['nvfail=on', 'create_ucode=on', 'convert_ucode=on', 'schedsnapname=create_time', 'fractional_reserve=0', 'space_slo=none'], 'status': ['raid4', 'flex', 'cluster', '64-bit'], 'state': ['online']}
查看here
修改强>
问题编辑完成后,我设法得到了以下信息:
(?P<volume>\w+)\s+(?P<state>o(?:n|ff)line)\s+(?:(?P<raid>raid\w+),\s+)?(?P<flex>flex)?(?:.*?(?:(?P<root>root)|guarantee=(?P<guarantee>\w+)))*
查看here
它仍然使用分组(特别是对于根|保证部分)而不是假设它们的顺序,但现在应该在一次匹配中做你想要的。