我有以下字符串:
...some random text...
{
"1":"one",
"2":"two",
"3":{
"31":{
"311":"threeoneone",
"312":"threeonetwo",
"313":"threeonethree"
}
},
"4":{
"41":"fourone",
"42":"fourtwo",
"43":"fourthree"
},
"5":"five",
"6":"six"
}
...some more random text...
如何从中提取JSON? 这就是我想要的。
{
"1": "one",
"2": "two",
"3": {
"31": {
"311": "threeoneone",
"312": "threeonetwo",
"313": "threeonethree"
}
},
"4": {
"41": "fourone",
"42": "fourtwo",
"43": "fourthree"
},
"5": "five",
"6": "six"
}
有没有Python的方法可以做到这一点?
答案 0 :(得分:2)
一种更健壮的解决方案,可以在没有任何内容假设的情况下在具有混合内容的文件中查找JSON对象(非JSON内容可能包含不成对的大括号,并且JSON内容可能包含包含不成对的大括号的字符串,等等。 )将遍历每次出现的{
并遍历每次发生的}
到左括号的右边,然后尝试将括号之间的子字符串解析为JSON:
import json
right_indices = [i for i, c in enumerate(s) if c == '}']
i = 0
while i < len(s) - 1:
if s[i] == '{':
for j in right_indices:
if i < j:
try:
print(json.loads(s[i: j + 1]))
i = j + 1
break
except json.decoder.JSONDecodeError:
pass
i += 1
给出您在变量s
中的输入字符串,将输出:
{'1': 'one', '2': 'two', '3': {'31': {'311': 'threeoneone', '312': 'threeonetwo', '313': 'threeonethree'}}, '4': {'41': 'fourone', '42': 'fourtwo', '43': 'fourthree'}, '5': 'five', '6': 'six'}
答案 1 :(得分:0)
假设JSON格式正确,并假定大括号内包含的所有内容均为JSON对象:
jsons = []
with open(f) as o:
parse_to_json = ""
for line in o:
if line == "{":
parsing_json_flag = True
if parsing_json_flag:
parse_to_json += line
if line == "}":
parsing_json_flag = False
parse_to_json = ""
jsons.append(parse_to_json)
现在,使用您喜欢的JSON解析库转换数组jsons
中的所有字符串。
答案 2 :(得分:0)
您可以使用正则表达式来识别json,例如:
import re
import json
text = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis lacinia efficitur metus, eget finibus leo venenatis non. Sed id massa luctus, hendrerit mauris id, auctor tortor.
{
"1":"one",
"2":"two",
"3":{
"31":{
"311":"threeoneone",
"312":"threeonetwo",
"313":"threeonethree"
}
},
"4":{
"41":"fourone",
"42":"fourtwo",
"43":"fourthree"
},
"5":"five",
"6":"six"
}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis lacinia efficitur metus, eget finibus leo venenatis non. Sed id massa luctus, hendrerit mauris id, auctor tortor.Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis lacinia efficitur metus, eget finibus leo venenatis non. Sed id massa luctus, hendrerit mauris id, auctor tortor.
"""
result = re.search(r'[a-zA-Z0-9 ,.\n]+(\{[a-zA-Z0-9 \":\{\},\n]+\})[a-zA-Z0-9 ,.\n]+', text)
try:
json_string = result.group(1)
json_data = json.loads(json_string)
print(json_data)
except IndexError:
print("No json found!")