如何提取字符串的段落之间包含的JSON对象?

时间:2019-04-04 22:08:45

标签: python

我有以下字符串:

...some random text...

{
   "1":"one",
   "2":"two",
   "3":{
      "31":{
         "311":"threeoneone",
         "312":"threeonetwo",
         "313":"threeonethree"
      }
   },
   "4":{
      "41":"fourone",
      "42":"fourtwo",
      "43":"fourthree"
   },
   "5":"five",
   "6":"six"
}

...some more random text...

如何从中提取JSON? 这就是我想要的。

{
  "1": "one",
  "2": "two",
  "3": {
    "31": {
      "311": "threeoneone",
      "312": "threeonetwo",
      "313": "threeonethree"
    }
  },
  "4": {
    "41": "fourone",
    "42": "fourtwo",
    "43": "fourthree"
  },
  "5": "five",
  "6": "six"
}

有没有Python的方法可以做到这一点?

3 个答案:

答案 0 :(得分:2)

一种更健壮的解决方案,可以在没有任何内容假设的情况下在具有混合内容的文件中查找JSON对象(非JSON内容可能包含不成对的大括号,并且JSON内容可能包含包含不成对的大括号的字符串,等等。 )将遍历每次出现的{并遍历每次发生的}到左括号的右边,然后尝试将括号之间的子字符串解析为JSON:

import json

right_indices = [i for i, c in enumerate(s) if c == '}']
i = 0
while i < len(s) - 1:
    if s[i] == '{':
        for j in right_indices:
            if i < j:
                try:
                    print(json.loads(s[i: j + 1]))
                    i = j + 1
                    break
                except json.decoder.JSONDecodeError:
                    pass
    i += 1

给出您在变量s中的输入字符串,将输出:

{'1': 'one', '2': 'two', '3': {'31': {'311': 'threeoneone', '312': 'threeonetwo', '313': 'threeonethree'}}, '4': {'41': 'fourone', '42': 'fourtwo', '43': 'fourthree'}, '5': 'five', '6': 'six'}

答案 1 :(得分:0)

假设JSON格式正确,并假定大括号内包含的所有内容均为JSON对象:

jsons = [] 
with open(f) as o:
    parse_to_json = "" 
    for line in o:
        if line == "{":
            parsing_json_flag = True
        if parsing_json_flag:
            parse_to_json += line
        if line == "}":
            parsing_json_flag = False
            parse_to_json = "" 
            jsons.append(parse_to_json)

现在,使用您喜欢的JSON解析库转换数组jsons中的所有字符串。

答案 2 :(得分:0)

您可以使用正则表达式来识别json,例如:

import re
import json

text = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis lacinia efficitur metus, eget finibus leo venenatis non. Sed id massa luctus, hendrerit mauris id, auctor tortor.

{
   "1":"one",
   "2":"two",
   "3":{
      "31":{
         "311":"threeoneone",
         "312":"threeonetwo",
         "313":"threeonethree"
      }
   },
   "4":{
      "41":"fourone",
      "42":"fourtwo",
      "43":"fourthree"
   },
   "5":"five",
   "6":"six"
}

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis lacinia efficitur metus, eget finibus leo venenatis non. Sed id massa luctus, hendrerit mauris id, auctor tortor.Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis lacinia efficitur metus, eget finibus leo venenatis non. Sed id massa luctus, hendrerit mauris id, auctor tortor.
"""

result = re.search(r'[a-zA-Z0-9 ,.\n]+(\{[a-zA-Z0-9 \":\{\},\n]+\})[a-zA-Z0-9 ,.\n]+', text)

try:
    json_string = result.group(1)
    json_data = json.loads(json_string)
    print(json_data)
except IndexError:
    print("No json found!")