我使用python,requests和beautifulsoup4解析来自Icecast服务器的/admin/state.xml
import requests
from bs4 import BeautifulSoup
r = requests.get('<host>/admin/state.xml', auth=('u', 'p'))
soup = BeautifulSoup(r.text, 'lxml-xml', from_encoding='ISO-8859-1')
mount_point_metadata = []
for mp in soup.find_all('source):
meta = {}
meta['mount_point'] = mp.get('mount')[1:]
try:
meta['server_name'] = (mp.find('server_name').text)
except AttributeError, e:
pass
mount_point_metadata.append(meta)
代码工作正常,并检索预期的数据。但是,当我检查mount_point_metadata
- 字典字符串时,挪威字符有问题,并且所有值都是utf-8:
{'mount_point': u'<name redacted>,
'server_name': u'<redacted> st\xf8rste!}
(在这种情况下,\xf8
应该是字母ø
)
即使我使用from_encoding='ISO-8859-1
为BeautifulSoup提供正确的编码,这会发生什么?
答案 0 :(得分:0)
只需对检索到的数据使用.encode('utf-8')
即可。我猜你的代码会是这样的:
meta['mount_point'] = map(lambda s: s.encode("utf-8"), mp.get('mount')[1:])