如何将网站上的文字拉成字典?

时间:2016-02-25 00:28:05

标签: python parsing text web webpage

我正试图从http://xkcd.com/info.0.json获取信息。基本上它看起来像一个简单的python字典,这就是我想将其转换为。我目前的代码是:

import urllib.request
with urllib.request.urlopen('http://xkcd.com/info.0.json') as response:
    html = [response.read()]
print(html)

和那个输出

[b'{"month": "2", "num": 1647, "link": "", "year": "2016", "news": "", "safe_title": "Diacritics", "transcript": "", "alt": "Using diacritics correctly is not my fort\\u00c3\\u00a9.", "img": "http:\\/\\/imgs.xkcd.com\\/comics\\/diacritics.png", "title": "Diacritics", "day": "24"}']

2 个答案:

答案 0 :(得分:2)

您正在接收JSON编码的响应。您可以使用json.loads()函数解析它:

import json
import urllib.request

with urllib.request.urlopen('http://xkcd.com/info.0.json') as response:
    data = json.loads(response.read().decode('utf8'))

>>> data
{'link': '', 'transcript': '', 'month': '2', 'year': '2016', 'alt': 'Using diacritics correctly is not my forté.', 'num': 1647, 'img': 'http://imgs.xkcd.com/comics/diacritics.png', 'day': '24', 'safe_title': 'Diacritics', 'news': '', 'title': 'Diacritics'}

使用requests模块更容易:

import requests
response = requests.get('http://xkcd.com/info.0.json')
data = response.json()

>>> data
{'link': '', 'transcript': '', 'month': '2', 'year': '2016', 'alt': 'Using diacritics correctly is not my forté.', 'num': 1647, 'img': 'http://imgs.xkcd.com/comics/diacritics.png', 'day': '24', 'safe_title': 'Diacritics', 'news': '', 'title': 'Diacritics'}

requests可以省去解码传入数据和解码JSON的麻烦。

答案 1 :(得分:0)

在Python 2.7中,您需要import urllib2然后import json将数据作为Python字典加载到变量中。资源here

import urllib2
import json
response = urllib2.urlopen('http://xkcd.com/info.0.json')
html = response.read().decode('utf8')

data = json.loads(html)
type(data) is dict # True