在Python中如何编码/解码un等unicode字符

时间:2014-03-22 02:04:08

标签: python urllib2 python-unicode utf8-decode

在CentOS 6.4上使用Python 2.6.6

import json
import urllib2    

url = 'http://www.google.com.hk/complete/search?output=toolbar&hl=en&q=how%20to%20pronounce%20e'
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
opener.addheaders = [('Accept-Charset', 'utf-8')]
response = opener.open(url)
page = response.read()
print page

结果:

...<suggestion data="how to pronounce eyjafjallaj

Python死了但没有错误消息。

我认为它会死,因为下一个字符是ö

<toplevel>
<CompleteSuggestion>
<suggestion data="how to pronounce edinburgh"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce elle"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce edith"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce et al"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce eunice"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce english names"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce edamame"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce erudite"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce eyjafjallajökull"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce either"/>
</CompleteSuggestion>
</toplevel>

<子> http://www.google.com.hk/complete/search?output=toolbar&hl=en&q=how%20to%20pronounce%20e

这似乎是一个unicode问题,我尝试过编码('utf-8')和解码('utf-8')在很多方面,但它仍然死了。有什么想法吗?

PS似乎我需要留下urllib2而不是urllib,因为urllib会忽略导致其他问题的cookie。

1 个答案:

答案 0 :(得分:1)

response.read()返回一个bytestring。 Python不应该在打印字节串时死亡,因为不会发生字符转换,字节按原样打印。

您可以尝试打印Unicode:

text = page.decode(response.info().getparam('charset') or 'utf-8')
print text