Question

我可以在python shell中执行以下操作：

>>> import urllib
>>> s='https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql'
>>> print urllib.unquote(s)
https://www.microsoft.com/de-at/store/movies/american-pie-präsentiert-nackte-tatsachen/8d6kgwzl63ql

但是，如果我在python程序中执行此操作，则会错误地解码url：

url = res.history[0].url if res.history else res.url
print '1111', url
print '2222', urllib.unquote(url)

111 https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql
222 https://www.microsoft.com/de-at/store/movies/american-pie-prÃ¤sentiert-nackte-tatsachen/8d6kgwzl63ql

为什么在程序中没有正确解码它，但它在我的python shell中？

Answer 1

以下是解决问题的方法：

url = urllib.unquote(str(res.url)).decode('utf-8', 'ignore')

res.url是一个unicode字符串，但似乎与urllib.unquote不兼容。因此，解决方案是首先将其转换为字符串（就像它在python解释器中的方式），然后将decode转换为Unicode。

urllib.unquote无法正确解码网址

1 个答案: