我有一些URL格式的数据,我想用Python解码它。我尝试了(接受的)答案here,但我仍然没有得到正确的解码。我的代码如下:
import urllib2
name = '%D0%BD%D0%BE%D1%82%D0%B8%D1%84%D0%B8%D0%BA%D0%B0%D1%82%D0%BE%D1%80-%D0%BE%D0%BB%D0%B8%D0%BC%D0%BF%D0%B8%D0%B9%D1%81%D0%BA%D0%B8%D1%85-%D0%B8'
print urllib2.unquote(urllib2.quote(name.encode("utf8"))).decode("utf8")
这应打印нотификатор-олимпийских-и
,但会打印%D0%BD%D0%BE%D1%82%D0%B8%D1%84%D0%B8%D0%BA%D0%B0%D1%82%D0%BE%D1%80-%D0%BE%D0%BB%D0%B8%D0%BC%D0%BF%D0%B8%D0%B9%D1%81%D0%BA%D0%B8%D1%85-%D0%B8
所以我试着再次取消它
print urllib2.unquote(urllib2.unquote(urllib2.quote(name.encode(“utf8”)))。decode(“utf8”))
但它给了我ноÑиÑикаÑоÑ-олимпийÑкиÑ-и
我不确定为什么会这样。任何人都可以解释我在哪里做错了,我该如何纠正我的错误?
答案 0 :(得分:1)
引用/取消引用操作太多:你得到一个已经 URL编码的UTF-8字符串,你为什么再用UTF-8和URL编码呢?
unquoted = urllib.unquote(name)
print unquoted.decode('utf-8')
# нотификатор-олимпийских-и