Question

我正在查看this codegolf problem，并决定尝试使用python solution并使用urllib代替。我修改了some sample code以使用json操纵urllib：

import urllib.request
import json

res = urllib.request.urlopen('http://api.stackexchange.com/questions?sort=hot&site=codegolf')
res_body = res.read()

j = json.loads(res_body.decode("utf-8"))

这给出了：

➜  codegolf python clickbait.py
Traceback (most recent call last):
  File "clickbait.py", line 7, in <module>
    j = json.loads(res_body.decode("utf-8"))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

如果您转到：http://api.stackexchange.com/questions?sort=hot&site=codegolf并点击＆＃34; Headers＆＃34;它说charset=utf-8。为什么用urlopen给我这些奇怪的结果？

Answer 1

res_body被gzip压缩。我不确定解压缩是urllib默认处理的事情。

如果您解压缩来自API服务器的响应，您将获得数据。

import urllib.request
import zlib
import json

with urllib.request.urlopen(
    'http://api.stackexchange.com/questions?sort=hot&site=codegolf'
    ) as res:

    decompressed_data = zlib.decompress(res.read(), 16+zlib.MAX_WBITS)
    j = json.loads(decompressed_data, encoding='utf-8')

    print(j)

无法解码Stack Exchange API的unicode

1 个答案: