Question

我正在尝试从this XML文件中读取表情符号。手动复制它们是可行的，它们可以打印并仍然可以在浏览器中正确显示。

import requests
import xml.etree.ElementTree as ET

root = ET.fromstring(requests.get('http://www.unicode.org/repos/cldr/trunk/common/annotations/en.xml').text)

print(root[1][21].attrib['cp'])

这应该是用微笑的眼睛取笑'笑脸' bytes（，'utf-8'）返回：b'\ xf0 \ x9f \ x98 \ x84'。但是使用上面的代码获取会产生'ð\ x9f \ x98 \ x84' 是否应该在XML解析器中完成某些事情？

Answer 1

Response.text将对内容进行解码（请参阅http://docs.python-requests.org/en/master/user/quickstart/#response-content）。 ElementTree再次解码已经解码的字节（基于<?xml version="1.0" encoding="UTF-8" ?>）。

尝试Response.content将未触动的回复传递给ElementTree：

import requests
import xml.etree.ElementTree as ET

root = ET.fromstring(requests.get('http://www.unicode.org/repos/cldr/trunk/common/annotations/en.xml').content)

print(root[1][21].attrib['cp'])

使用Python从XML文件中读取表情符号

1 个答案: