应用错误收集

Python - BeautifulSoup - 非英文字符无法正确读取

时间：2018-06-06 07:22:06

标签： python html utf-8 character-encoding beautifulsoup

我一直在尝试使用BeautifulSoup从HTML页面中提取数据，但其他语言中的字符无法正确读取。

我正在使用的代码：

soup=BeautifulSoup("C:\Myfile.html","html.parser")
htmlText=soup.body.get_text()

错误示例：ß打印为ÃŸ

html元：

<meta http-equiv="Content-Type" content="text/html; charset=utf-8 ">

我也试过soup.decode("utf-8")

使用Python 3.6

为什么会这样？请帮助。

0 个答案:

没有答案