使用python解析HTTP数组响应

时间:2018-02-09 17:11:42

标签: python json python-3.x http parsing

我正在尝试通过json解析HTTP响应,但它给了我字符错误,但是当我尝试通过for循环遍历此响应时,它会将所有内容拆分为单个字符。是否有更好的方法来解析此响应?

代码:

    _url = self.MAIN_URL
    try:
        _request = self.__webSession.get(_url, cookies=self.__cookies)
        if _request.status_code != 200:
            self.log("Request failed with code: {}. URL: {}".format(_request.status_code, _url))
            return
    except Exception as err:
        self.log("[e4] Web-request error: {}. URL: {}".format(err, _url))
        return

    _text = _request.json()

json.loads()返回以下

 Expecting value: line 1 column 110 (char 109)

需要解析HTTP响应:

[
  [
    9266939,
    'Value1',
    'Value2',
    'Value3',
            ,
    'Value4',
        [
            [
                'number',
                'number2',
                    [
                        'value',
                               ,
                        'value2'
                    ]
            ]
        ]
  ],
  [
    5987798,
    'Value1',
    'Value2',
            ,
    'Value3',
    'Value4',
        [
            [
                'number',
                'number2',
                    [
                        'value',
                        'value2'
                    ]
            ]
        ]
  ]
]

1 个答案:

答案 0 :(得分:0)

虽然错误消息由于行号和列号而令人困惑,但JSON format在任何情况下都不接受字符串的单引号,因此给定的HTTP响应不是JSON格式。你必须为字符串使用双引号。

所以你必须改变这样的输入(如果你控制它):

[
  [
    9266939,
    "Value1",
    "Value2",
    "Value3",
    "Value4",
    [
        [
        "number",
        "number2",
            [
            "value",
            "value2"
            ]
        ]
...

如果您无法控制正在解析的HTTP响应,则可以在解析之前用双引号替换所有单引号:

http_response_string = (get the HTTP response)
adjusted_http_response_string = http_response_string.replace("'", '"')
data = json.loads(adjusted_http_response_string)

但这当然会带来替换不是字符串分隔符的单引号(或撇号)的潜在风险。但是,它可以充分解决问题,但大部分时间都在工作。

修改

根据评论中的要求进一步清理:

http_response_string = (get the HTTP response)

# More advanced replacement of ' with ", expecting
# strings to always come after at least four spaces,
# and always end in either comma, colon, or newline.
adjusted_http_response_string = \
    re.sub("(    )'", r'\1"',
    re.sub("'([,:\n])", r'"\1',
    http_response_string))

# Replacing faulty ",  ," with ",".
adjusted_http_response_string = \
    re.sub(",(\s*,)*", ",", 
    adjusted_http_response_string)

data = json.loads(adjusted_http_response_string)