如何解析奇怪的Python Request响应?

时间:2019-08-28 11:27:50

标签: python json xml web-scraping python-requests

我正在为客户做一件工作,我需要从他们的网站中获取一些数据。用随机的邮政编码击中端点,正确的JSON会成功,但是响应不是我所期望的。

它实际上看起来像是有效的JSON,但在HTML键中包含转义的HTML以及换行符,并且在返回之前和之后返回。

我可以使用以下命令将其解析为字典

json_string = json.loads(r.text)

尽管Python表示它是字符串,但我无法访问其中的HTML密钥。

实际上我不知道该怎么做。请问如何在Python中解析它,以便将HTML放入漂亮的汤中?

 {'d': '{\r\n  "result": "200",\r\n  "HTML": "<table style=\\"max-width:750px;\\"><tr id=\'resultsHeader2\'><th class=\'thMid\'>Select</td><th class=\'thMid\'>Address</td><th class=\'thMid\'>Street</td><th class=\'thMid\'>Area</td><th class=\'thMid\'>Postcode</td></tr><tr class=\'tResults\' id=\'uprnRow0\'><td id=\'uprnButton0\'><button type=\'button\' onclick=\\"changeText(\'uprnButton0\',\'Loading\');populAddr(\'105 BERKSHIRE DRIVE RAMLEAZE SWINDON SN1 5RP\');getobject(\'divAddress\').innerHTML = \'\';GetInfoAndRoundsFor(\'345634564356\',\'SWN\');\\" title=\'Get Calendar for this address\'>Show</button></td><td>105</td><td>BERKSHIRE DRIVE</td><td>RAMLEAZE<br/>SWINDON</td><td>SN1 5RP</td><tr class=\'tResults\' id=\'uprnRow1\'><td id=\'uprnButton1\'><button type=\'button\' onclick=\\"changeText(\'uprnButton1\',\'Loading\');populAddr(\'150 BERKSHIRE DRIVE RAMLEAZE SWINDON SN15 5RP\');getobject(\'divAddress\').innerHTML = \'\';GetInfoAndRoundsFor(\'3456346435634\',\'SWN\');\\" title=\'Get Calendar for this address\'>Show</button></td><td>150</td><td>BERKSHIRE DRIVE</td><td>RAMLEAZE<br/>SWINDON</td><td>SN15 5RP</td><tr><td class=\'tableFoot\' colspan=\'5\'></tr></table>",\r\n  "r1": "Swindon",\r\n  "r2": "",\r\n  "r3": ""\r\n}'}

我以前没看过,它看起来很可怕...:-)

更新为r.text:

{"d":"{\r\n  \"result\": \"200\",\r\n  \"HTML\": \"\u003ctable style=\\\"max-width:750px;\\\"\u003e\u003ctr id=\u0027resultsHeader2\u0027\u003e\u003cth class=\u0027thMid\u0027\u003eSelect\u003c/td\u003e\u003cth class=\u0027thMid\u0027\u003eAddress\u003c/td\u003e\u003cth class=\u0027thMid\u0027\u003eStreet\u003c/td\u003e\u003cth class=\u0027thMid\u0027\u003eArea\u003c/td\u003e\u003cth class=\u0027thMid\u0027\u003ePostcode\u003c/td\u003e\u003c/tr\u003e\u003ctr class=\u0027tResults\u0027 id=\u0027uprnRow0\u0027\u003e\u003ctd id=\u0027uprnButton0\u0027\u003e\u003cbutton type=\u0027button\u0027 onclick=\\\"changeText(\u0027uprnButton0\u0027,\u0027Loading\u0027);populAddr(\u00275 BERKSHIRE DRIVE RAMLEAZE SWINDON SN5 5RP\u0027);getobject(\u0027divAddress\u0027).innerHTML = \u0027\u0027;GetInfoAndRoundsFor(\u00345643564536\u0027,\u0027SWN\u0027);\\\" title=\u0027Get Calendar for this address\u0027\u003eShow\u003c/button\u003e\u003c/td\u003e\u003ctd\u003e5\u003c/td\u003e\u003ctd\u003eBERKSHIRE DRIVE\u003c/td\u003e\u003ctd\u003eRAMLEAZE\u003cbr/\u003eSWINDON\u003c/td\u003e\u003ctd\u003eSN5 5RP\u003c/td\u003e\u003ctr class=\u0027tResults\u0027 id=\u0027uprnRow1\u0027\u003e\u003ctd id=\u0027uprnButton1\u0027\u003e\u003cbutton type=\u0027button\u0027 onclick=\\\"changeText(\u0027uprnButton1\u0027,\u0027Loading\u0027);populAddr(\u002715 BERKSHIRE DRIVE RAMLEAZE SWINDON SN5 5RP\u0027);getobject(\u0027divAddress\u0027).innerHTML = \u0027\u0027;GetInfoAndRoundsFor(\u3456345634575\u0027,\u0027SWN\u0027);\\\" title=\u0027Get Calendar for this address\u0027\u003eShow\u003c/button\u003e\u003c/td\u003e\u003ctd\u003e15\u003c/td\u003e\u003ctd\u003eBERKSHIRE DRIVE\u003c/td\u003e\u003ctd\u003eRAMLEAZE\u003cbr/\u003eSWINDON\u003c/td\u003e\u003ctd\u003eSN5 5RP\u003c/td\u003e\u003ctr\u003e\u003ctd class=\u0027tableFoot\u0027 colspan=\u00275\u0027\u003e\u003c/tr\u003e\u003c/table\u003e\",\r\n  \"r1\": \"Swindon\",\r\n  \"r2\": \"\",\r\n  \"r3\": \"\"\r\n}"}

1 个答案:

答案 0 :(得分:1)

此服务发生了一些奇怪的事情。检查我对这个问题的评论。原始JSON数组的属性"d"内似乎还有一个附加的JSON数组:

json.loads(json.loads(r.text)['d'])

我得到:

{
     u'HTML': u'<table style="max-width:750px;"><tr id=\'resultsHeader2\'><th class=\'thMid\'>Select</td><th class=\'thMid\'>Address</td><th class=\'thMid\'>Street</td><th class=\'thMid\'>Area</td><th class=\'thMid\'>Postcode</td></tr><tr class=\'tResults\' id=\'uprnRow0\'><td id=\'uprnButton0\'><button type=\'button\' onclick="changeText(\'uprnButton0\',\'Loading\');populAddr(\'105 BERKSHIRE DRIVE RAMLEAZE SWINDON SN1 5RP\');getobject(\'divAddress\').innerHTML = \'\';GetInfoAndRoundsFor(\'345634564356\',\'SWN\');" title=\'Get Calendar for this address\'>Show</button></td><td>105</td><td>BERKSHIRE DRIVE</td><td>RAMLEAZE<br/>SWINDON</td><td>SN1 5RP</td><tr class=\'tResults\' id=\'uprnRow1\'><td id=\'uprnButton1\'><button type=\'button\' onclick="changeText(\'uprnButton1\',\'Loading\');populAddr(\'150 BERKSHIRE DRIVE RAMLEAZE SWINDON SN15 5RP\');getobject(\'divAddress\').innerHTML = \'\';GetInfoAndRoundsFor(\'3456346435634\',\'SWN\');" title=\'Get Calendar for this address\'>Show</button></td><td>150</td><td>BERKSHIRE DRIVE</td><td>RAMLEAZE<br/>SWINDON</td><td>SN15 5RP</td><tr><td class=\'tableFoot\' colspan=\'5\'></tr></table>',
     u'r3': u'',
     u'result': u'200',
     u'r2': u'',
     u'r1': u'Swindon'
}