Question

我从没有所有标签的html页面中提取文本（使用Python和BeautifulSoup）。但是，标签不会替换为空白。因此，例如，对于＆＃34; blah blahDIVTAGblah＆＃34;我得到以下文字＆＃34; blah blahblah＆＃34;。如何在第二个和第三个之间插入空格？我使用以下代码。

# kill all script and style elements
    for script in soup(["script", "style"]):
        script.extract()

代码来自BeautifulSoup Grab Visible Webpage Text

Answer 1

您可以使用.replace_with()

简单地将标记替换为空白

for script in soup(["script", "style"]):
    script.replace_with(" ")

如何用空白替换文本中提取的HTML标记？

1 个答案: