Question

我正在尝试解析来自wolframalpha的xml文档。

现在使用的编码是：

from xml.etree import ElementTree as ET
import urllib
import urllib2

def ask(s=None):
    q = urllib.urlencode(dict(input=s, appid='apikey'))
    url = 'http://api.wolframalpha.com/v2/query?' + q
    r = urllib2.urlopen(url)

    p = ET.parse(r)

ask('what is the distance from montreal to New York?')

我在内容中收到包含非法字符的错误。虽然它与普通的python完美无瑕。

这是xml：http://pastebin.com/ektp23bN

使用ironpython 2.7.4。 .Net 4.0 32位

任何提示？

Answer 1

我能够重现你描述的行为。与CPython 2.7.4 / 2.7.6相比，IronPython 2.7.4附带的ElementTree和xmllib实现似乎有所不同，你可能/应该file a bug

您可以通过显式解码utf-8来解决此问题：

r = urllib2.urlopen(url)
data = r.read().decode('utf-8')
p = ET.fromstring(data)

ironpython elementtree，内容中的非法字符

1 个答案: