是否有任何好的正则表达式/函数或软件包可让我们将缩进的结构化文本/数据解析为字典?例如,我有类似这样的数据(可能比我下面提到的更深层次):
xyz1 : 14
xyz2 : 35
xyz3 : 14
xyz4
sub1_xyz4
sub1_sub1_xyz4 : 45
sub2_sub1_xyz4 : b1fawe
sub2 xyz4 : 455
xyz5 : 2424
我想将其转换为像这样的字典
{
'xyz1': '14',
'xyz2': '34',
'xyz3': '14',
'xyz4': {
'sub1_xyz4': {
'sub1_sub1_xyz4': '45',
'sub2_sub1_xyz4': 'b1fawe',
},
'sub2_xyz4': '455'
},
'xyz5': '2424'
}
我尝试了以下操作,但始终无法获得。我觉得在尝试管理缩进/子属性时有一个很好的递归(以便它可以处理未知深度)功能。有什么建议吗?
def parse_output(value, indent=0):
parsed_dict = dict()
if indent > 0:
for i in re.split('\n(?!\s{,%d})' % (indent - 1), value):
print("split value is: : ", i)
if '\n' not in i:
iter_val = iter(list(map(lambda x: x.strip(), re.split(' : ', i))))
parsed_dict = {**parsed_dict, **dict(zip(iter_val, iter_val))}
else:
parse_bearer_info(re.split('\n', i, 1)[1])
iter_val = iter(list(map(lambda x: x.strip(), re.split('\n', i, 1))))
parsed_dict = {**parsed_dict, **dict(zip(iter_val, iter_val))}
else:
for i in re.split('\n(?!\s+)', value):
#print("iteration value is: ", i)
if '\n' not in i:
iter_val = iter(list(map(lambda x: x.strip(), re.split(' : ', i))))
parsed_dict = {**parsed_dict, **dict(zip(iter_val, iter_val))}
else:
#print(re.split('\n', i, 1))
#out = parse_bearer_info(re.split('\n', i, 1)[1], 4)
iter_val = iter(list(map(lambda x: x.strip(), re.split('\n', i, 1))))
parsed_dict = {**parsed_dict, **dict(zip(iter_val, iter_val))}
return parsed_dict
答案 0 :(得分:2)
您可能可以递归地执行此操作,但是由于您只需要跟踪单个缩进级别,因此可以仅与当前对象保持堆栈。将密钥添加到堆栈中的最后一项。当该值为空时,添加一个新的字典并将其推入堆栈。当缩进量减少时,从堆栈中弹出。
类似的东西:
res = {}
stack = [res]
cur_indent = 0
for line in s.split('\n'):
indent = len(line) - len(line.lstrip())
if (indent < cur_indent): # backing out
stack.pop()
cur_indent = indent
else:
cur_indent = indent
vals = line.replace(" ", "").split(':')
current_dict = stack[-1]
if(len(vals) == 2):
current_dict[vals[0]] = vals[1]
else: # no value, must be a new level
current_dict[vals[0]] = {}
stack.append(current_dict[vals[0]])
结果:
{'xyz1': '14',
'xyz2': '35',
'xyz3': '14',
'xyz4': {'sub1_xyz4': {'sub1_sub1_xyz4': '45', 'sub2_sub1_xyz4': 'b1fawe'},
'sub2xyz4': '455'},
'xyz5': '2424'}
答案 1 :(得分:2)
您可以将itertools.groupby
用于递归:
import itertools, re, json
_data = [re.split('\s+:\s+', i) for i in filter(None, content.split('\n'))]
def group_data(d):
_d = [[a, list(b)] for a, b in itertools.groupby(d, key=lambda x:bool(x[-1]) and not x[0].startswith(' '))]
_new_result = {}
for a, b in _d:
if a:
_new_result.update(dict([[c, _d] for c, [_d] in b]))
else:
_new_result[b[0][0]] = group_data([[c[2:], _d] for c, _d in b[1:]])
return _new_result
print(json.dumps(group_data([[a, b] for a, *b in _data]), indent=4))
输出:
{
"xyz1": "14",
"xyz2": "35",
"xyz3": "14",
"xyz4": {
"sub1_xyz4": {
"sub1_sub1_xyz4": "45",
"sub2_sub1_xyz4": "b1fawe"
},
"sub2 xyz4": "455"
},
"xyz5": "2424"
}
content
在哪里:
xyz1 : 14
xyz2 : 35
xyz3 : 14
xyz4
sub1_xyz4
sub1_sub1_xyz4 : 45
sub2_sub1_xyz4 : b1fawe
sub2 xyz4 : 455
xyz5 : 2424