URL UTF-8解码Python

时间:2014-02-26 00:33:24

标签: python utf-8 character-encoding

我有一些URL格式的数据,我想用Python解码它。我尝试了(接受的)答案here,但我仍然没有得到正确的解码。我的代码如下:

import urllib2

name = '%D0%BD%D0%BE%D1%82%D0%B8%D1%84%D0%B8%D0%BA%D0%B0%D1%82%D0%BE%D1%80-%D0%BE%D0%BB%D0%B8%D0%BC%D0%BF%D0%B8%D0%B9%D1%81%D0%BA%D0%B8%D1%85-%D0%B8'

print urllib2.unquote(urllib2.quote(name.encode("utf8"))).decode("utf8")

这应打印нотификатор-олимпийских-и,但会打印%D0%BD%D0%BE%D1%82%D0%B8%D1%84%D0%B8%D0%BA%D0%B0%D1%82%D0%BE%D1%80-%D0%BE%D0%BB%D0%B8%D0%BC%D0%BF%D0%B8%D0%B9%D1%81%D0%BA%D0%B8%D1%85-%D0%B8

所以我试着再次取消它

print urllib2.unquote(urllib2.unquote(urllib2.quote(name.encode(“utf8”)))。decode(“utf8”))

但它给了我ноÑиÑикаÑоÑ-олимпийÑкиÑ-и

我不确定为什么会这样。任何人都可以解释我在哪里做错了,我该如何纠正我的错误?

1 个答案:

答案 0 :(得分:1)

引用/取消引用操作太多:你得到一个已经 URL编码的UTF-8字符串,你为什么再用UTF-8和URL编码呢?

unquoted = urllib.unquote(name)
print unquoted.decode('utf-8')
# нотификатор-олимпийских-и