Question

我有一些文字：

text = test 

我读过beautiful soup 4：

soup = BeautifulSoup(text, "html.parser") # soup: test 

然后我要获取文本节点：

text_nodes = soup.find_all(text=True)

但是转义的HTML在此过程中未转义：text_nodes: ['test']

如何防止find_all()步骤转换我的转义HTML标记？

Answer 1

对于text=True，我认为没有办法将字符串保持原样。

我的解决方案是通过循环逃避结果

from bs4 import BeautifulSoup
from html import escape

text = '<p>&lt;b&gt;test&lt;/b&gt;<br/></p>'
soup = BeautifulSoup(text, "html.parser")
text_nodes = [escape(x) for x in soup.strings]
print(text_nodes)
# ['&lt;b&gt;test&lt;/b&gt;']

soup.strings是soup.find_all(text=True)的缩写。

阻止BeautifulSoup的find_all（）转换转义的html标签

1 个答案: