Question

我正在使用BeautifulSoup从此网站https://lawyers.justia.com/lawyer/michael-paul-ehline-85006抓取

我不希望在输出中显示赞助商清单：

我的代码：

for o in soup.findAll('div', attrs={"class": "block-wrapper"}): 
    for de in o.findAll("li"):
        if de != []:
            de=remove_tags(str(de))
            print (de)

python输出： OUTPUT IMAGE

Answer 1

您可以删除HTML页面中的内容。找到使用findAll（'div'，attrs = {“ class”：“ primary-sidebar-wrapper”}）所需的元素之后。您可以执行以下操作：

tag = soup.findAll('div', attrs={"class": "block-wrapper"})
tag[0].replace_with("")

这还将替换变量汤中的HTML文本

如何从网络抓取功能中删除某些信息（Beautiful Soup）：

1 个答案: