Question

我的代码：

html = "<tag>&nbsp;</tag>"
from bs4 import BeautifulSoup
print BeautifulSoup(html).renderContents()

输出：

<tag>┬á</tag>

期望的输出：

<tag>&nbsp;</tag>

BeautifulSoup似乎被替换为我的破解空间html转义与unicode字符意味着同样的事情。但这并没有完全通过我的系统，最终成为一个不间断的空间，从而没有做我想要的。有没有办法告诉BeautifulSoup不这样做？

Answer 1

使用encode_contents代替renderContents，encode或prettify。它们都支持formatter参数，并将'html'作为格式化程序传递：

html = "<tag>&nbsp;</tag>"
from bs4 import BeautifulSoup
print BeautifulSoup(html).encode_contents(formatter='html')

产生

<tag>&nbsp;</tag>