Question

我想从网上抓一张桌子并保持＆amp; nbsp;实体完好无损，以便我以后可以重新发布为HTML。尽管如此，BeautifulSoup似乎正在将这些转换为空间。例如：

from bs4 import BeautifulSoup

html = "<html><body><table><tr>"
html += "<td>&nbsp;hello&nbsp;</td>"
html += "</tr></table></body></html>"

soup = BeautifulSoup(html)
table = soup.find_all('table')[0]
row = table.find_all('tr')[0]
cell = row.find_all('td')[0]

print cell

观察结果：

<td> hello </td>

必填结果：

<td>&nbsp;hello&nbsp;</td>

Answer 1

在bs4 convertEntities中，不再支持BeautifulSoup构造函数的参数。 HTML实体始终转换为相应的Unicode字符（请参阅docs）。

根据文档，您需要使用输出格式化程序，如下所示：

print soup.find_all('td')[0].prettify(formatter="html")

使用美丽的汤保留实体刮

1 个答案: