Question

我的xml文件正在编码：

<?xml version="1.0" encoding="utf-8"?>

我正在尝试使用漂亮的汤来解析此文件。

from bs4 import BeautifulSoup

fd = open("xmlsample.xml")  
soup = BeautifulSoup(fd,'lxml-xml',from_encoding='utf-8')

但这会导致

Traceback (most recent call last):
  File "C:\Users\gregg_000\Desktop\Python 
Experiments\NRE_XMLtoCSV\NRE_XMLtoCSV\bs1.py", line 4, in <module>
    soup = BeautifulSoup(fd,'lxml-xml', from_encoding='utf-8')
  File 
"C:\Users\gregg_000\AppData\Local\Programs\Python\Python36\lib\site-

packages \ bs4__init __。py”，第245行，在 init 中标记= markup.read（）文件

“ C：\ Users \ gregg_000 \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ encodings \ cp125 2.py“，第23行，在解码中返回codecs.charmap_decode（input，self.errors，decoding_table）[0] UnicodeDecodeError：“ charmap”编解码器无法解码位置的字节0x9d 5343910：字符映射到未定义

我的感觉是Python想要使用默认的cp1252字符集。如何在无需诉诸命令行的情况下强制utf-8？（我在一个设置中，无法轻易地强制对python设置进行全局更改。）

Answer 1

您还应该将编码添加到您的open()调用中（the docs表示这是可接受的参数）。在Windows中（至少在我的安装中）默认情况下，默认值为cp1252。

from bs4 import BeautifulSoup

fd = open("xmlsample.xml", encoding='utf-8')
soup = BeautifulSoup(fd,'lxml-xml',from_encoding='utf-8')

用漂亮的汤处理xml编码错误

1 个答案: