我收到了以json格式返回的API的响应。它看起来如下:
page = requests.get(link)
page_dict = json.loads(page.content)
print page_dict
>> {u'sm_api_title': u'The Biggest Mysteries of Missing Malaysian Flight MH370', u'sm_api_keyword_array': [u'flight', u'plane', u'pilot', u'crash', u'passenger'], u'sm_api_content': u' Since the plane's disappearance early Saturday, revelations about the passenger list and plane's flight plan have left officials scrambling to decipher new complicated clues. The most dangerous parts of a flight are traditionally the takeoff and landing, but the missing jetliner disappeared about two hours into a six-hour flight, when it should have been cruising safely around 35,000 feet. The last plane to crash at altitude was Air France Flight 447, which crashed during a thunderstorm in the Atlantic Ocean en route from Rio De Janeiro to Paris. A day after the flight disappeared the biggest question authorities are asking is did the plane turn around and why? The first officer on the flight was identified as Fariq Hamid, 27, and had about 2,800 flight hours since 2007.', u'sm_api_limitation': u'Waited 0 extra seconds due to API limited mode, 89 requests left to make for today.', u'sm_api_character_count': u'773'}
正如您所看到的,响应中包含'
等字符的响应。清除此响应的最佳方法是什么?
我以前使用过xmllib并让它工作,但是当我和django一起使用时,它会给我弃用警告。
感谢您的帮助!
答案 0 :(得分:4)
您需要取消字符串以解码HTML字符。您可以使用标准库来取消HTML字符串:
import HTMLParser
parser = HTMLParser.HTMLParser()
unescaped_string = parser.unescape(html_escaped_string)