我有一个文本文件,其格式如下,每个连字符代表列表项的层次结构。
category1 : 0120391123123
- subcategory : 0120391123123
-- subsubcategory : 019301948109
--- subsubsubcategory : 013904123908
---- subsubsubsubcategory : 019341823908
- subcategory2 : 0934810923801
-- subsubcategory2 : 09341829308123
category2: 1309183912309
- subcategory : 10293182094
...
我如何以编程方式将像这样的列表转换成如下所示的json格式?
[
{
"category1":"0120391123123"
},
[
{
"subcategory":"0120391123123"
},
[
{
"subsubcategory":"019301948109"
},
[
{
"subsubsubcategory":"013904123908"
},
[
{
"subsubsubsubcategory":"019341823908"
}
]
]
]
],
[
{
"subcategory2":"0934810923801"
},
[
{
"subsubcategory2":"09341829308123"
}
]
],
[
{
"category2":"1309183912309"
},
[
{
"subcategory":"10293182094"
}
]
]
]
答案 0 :(得分:0)
您可以对itertools.groupby
使用递归:
s = """
category1 : 0120391123123
- subcategory : 0120391123123
-- subsubcategory : 019301948109
--- subsubsubcategory : 013904123908
---- subsubsubsubcategory : 019341823908
- subcategory2 : 0934810923801
-- subsubcategory2 : 09341829308123
category2: 1309183912309
- subcategory : 10293182094
"""
import re, itertools
data = list(filter(None, s.split('\n')))
def group_data(d):
if len(d) == 1:
return [dict([re.split('\s*:\s*', d[0])])]
grouped = [[a, list(b)] for a, b in itertools.groupby(d, key=lambda x:not x.startswith('-'))]
_group = [[grouped[i][-1], grouped[i+1][-1]] for i in range(0, len(grouped), 2)]
_c = [[dict([re.split('\s*:\s*', i) for i in a]), group_data([c[1:] for c in b])] for a, b in _group]
return [i for b in _c for i in b]
print(json.dumps(group_data(data), indent=4))
输出:
[
{
"category1": "0120391123123"
},
[
{
" subcategory": "0120391123123"
},
[
{
" subsubcategory": "019301948109"
},
[
{
" subsubsubcategory": "013904123908"
},
[
{
" subsubsubsubcategory": "019341823908"
}
]
]
],
{
" subcategory2": "0934810923801"
},
[
{
" subsubcategory2": "09341829308123"
}
]
],
{
"category2": "1309183912309"
},
[
{
" subcategory": "10293182094"
}
]
]
注意:此答案假设您的最终输出应具有"category2"
与"category1"
处于同一级别,因为两者的前面都不含"-"
。
答案 1 :(得分:0)
使用递归函数将文件内容拆分为多个块并使用分而治之
from pprint import pprint
req=[]
startingindex=-1
with open('temp.txt' ,'r') as f:
content=f.read().split('\n')
def foo(splitcontent):
index=0
reqlist=[]
while(index<len(splitcontent)):
if (splitcontent[index][0]!='-'):
key,value=splitcontent[index].split(':')
reqlist.append({key.strip():value.strip()})
index+=1
templist=[]
while(index<len(splitcontent) and splitcontent[index][0]=='-'):
templist.append(splitcontent[index][1:])
index+=1
intermediatelist=foo(templist)
if(intermediatelist):
reqlist.append(intermediatelist)
return reqlist
pprint(foo(content))
输出
[{'category1': '0120391123123'},
[{'subcategory': '0120391123123'},
[{'subsubcategory': '019301948109'},
[{'subsubsubcategory': '013904123908'},
[{'subsubsubsubcategory': '019341823908'}]]],
{'subcategory2': '0934810923801'},
[{'subsubcategory2': '09341829308123'}]],
{'category2': '1309183912309'},
[{'subcategory': '10293182094'}]]