Question

所以，我正在获取一些xml数据。一个这样的例子如下：

xmlString = '<location>san diego, ça</location>'

目前这是一个字符串。我现在需要使用ElementTree，fromstring（）方法将其转换为XML对象。导入如下：

import xml.etree.ElementTree as ET

方法调用是：

xml = ET.fromstring(xmlString)

我一直在犯错误，说：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position xxx: 
ordinal not in range(128)

为了解决这个问题，我在StackOverflow以及Python Docs上看了很多。

似乎建议对字符串进行编码和解码。

xmlString = xmlString.encode('utf-8', 'ignore')
xmlString = xmlString.decode('ascii', 'ignore')

忽略是错误，但它们仍然出现。这是在将xmlString转换为xml对象之前完成的。但仍然出现错误！

有什么想法吗？

完整的代码是：

xmlString = '<?xml version="1.0" encoding="UTF-8"?><o><location>san diego, ça</location>
</o>'
xmlString = xmlString.encode('utf-8', 'ignore')
xmlString = xmlString.decode('ascii', 'ignore')
xml = ET.fromstring(xmlString)

使用Python 2.7

Answer 1

您正在呼叫str.encode(); Python 2字符串已经编码，因此Python尝试做正确的事情并首先解码到unicode，以便它可以将值编码回字节字符串你。

使用默认编解码器ASCII：

完成此隐式解码

>>> '<?xml version="1.0" encoding="UTF-8"?><o><location>san diego, ça</location></o>'.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 62: ordinal not in range(128)

请注意，我调用了.encode()，但例外是UnicodeDecodeError; Python首先在解码。

但是，由于ET.fromstring()已经想要 UTF-8编码字节，因此您无需重新编码值。

如果您发现解析字符串值时出现问题，请确保使用正确的编解码器UTF8从文本编辑器中保存Python源代码。

无法解码字节

1 个答案: