我正在python中创建一个可以解析Python中yr.no的天气数据的应用程序。它适用于常规ASCII字符串,但在使用unicode时失败。
def GetYRNOWeatherData(country, province, place):
#Parse the XML file
wtree = ET.parse(urllib.urlopen("http://www.yr.no/place/" + string.replace(country, ' ', '_').encode('utf-8') + "/" + string.replace(province, ' ', '_').encode('utf-8') + "/" + string.replace(place, ' ', '_').encode('utf-8') + "/forecast.xml"))
例如,当我尝试
时GetYRNOWeatherData("France", "Île-de-France", "Paris")
我收到此错误
'charmap' codec can't encode character u'\xce' in position 0: character maps to <undefined>
urllib不能很好地处理unicode吗?由于我使用Tkinter作为此函数的前端,这是问题的根源(Tkinter Entry小部件是否能很好地处理unicode?)
答案 0 :(得分:1)
您可以通过将每个字符串保持为unicode
来处理此问题,直到您实际发出urllib.urlopen
请求,此时您encode
到utf-8
:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# This import makes all literal strings in the file default to
# type 'unicode' rather than type 'str'. You don't need to use this,
# but you'd need to do u"France" instead of just "France" below, and
# everywhere else you have a string literal.
from __future__ import unicode_literals
import urllib
import xml.etree.ElementTree as ET
def do_format(*args):
ret = []
for arg in args:
ret.append(arg.replace(" ", "_"))
return ret
def GetYRNOWeatherData(country, province, place):
country, province, place = do_format(country, province, place)
url = "http://www.yr.no/place/{}/{}/{}/forecast.xml".format(country, province, place)
wtree = ET.parse(urllib.urlopen(url.encode('utf-8')))
return wtree
if __name__ == "__main__":
GetYRNOWeatherData("France", "Île-de-France", "Paris")