Question

我在xml文件中有这个字符：

<data>
  <products>
      <color>fumè</color>
  </product>
</data>

我尝试使用以下代码生成ElementTree的实例：

string_data = open('file.xml')
x = ElementTree.fromstring(unicode(string_data.encode('utf-8')))

我收到以下错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 185: ordinal not in range(128)

（注意：位置不准确，我从更大的位置采样了xml）。

如何解决？感谢

Answer 1

您可能在使用Requests (HTTP for Humans)时偶然发现此问题，response.text默认情况下会对响应进行解码，您可以使用response.content来获取未解码的数据，因此ElementTree可以自行解码。请记住使用正确的编码。

更多信息：http://docs.python-requests.org/en/latest/user/quickstart/#response-content

Answer 2

您需要将解码 utf-8字符串转换为unicode对象。所以

string_data.encode('utf-8')

应该是

string_data.decode('utf-8')

假设string_data实际上是一个utf-8字符串。

总结一下：要从unicode对象获取utf-8字符串，您编码 unicode（使用utf-8编码），并将字符串转换为unicode对象使用相应的编码解码字符串。

有关我建议阅读The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets（不是特定于Python）的概念的更多详细信息。

Answer 3

你不需要解码XML才能使ElementTree工作。 XML带有自己的编码信息（默认为UTF-8），ElementTree为你工作，输出unicode：

>>> data = '''\
... <data>
...   <products>
...       <color>fumè</color>
...   </products>
... </data>
... '''
>>> x = ElementTree.fromstring(data)
>>> x[0][0].text
u'fum\xe8'

如果您的数据包含在文件（如）对象中，只需将文件名或文件对象直接传递给ElementTree.parse()函数：

x = ElementTree.parse('file.xml')

Answer 4

您是否尝试过使用parse功能，而不是打开文件...（BTW需要.read()后才能使.fromstring()正常工作...）< / p>

import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
root = tree.getroot()
# etc...

Answer 5

功能open()不会返回string。而是使用open('file.xml').read()。

Answer 6

您的文件最有可能不是UTF-8。 è字符可以来自其他编码，例如latin-1。

ElementTree和unicode

6 个答案: