Question

我正在使用BeautifulSoup来抓取网站的许多页面以征求意见。本网站的每一页都有评论“[[commentMessage]]”。我想过滤掉这个字符串，这样每次代码运行时都不会打印。我是python和BeautifulSoup的新手，但是在看了一下之后我似乎无法找到它，尽管我可能正在寻找错误的东西。有什么建议？我的代码如下：

from bs4 import BeautifulSoup
import urllib
r = urllib.urlopen('website url').read()
soup = BeautifulSoup(r, "html.parser")
comments = soup.find_all("div", class_="commentMessage")
for element in comments:
    print element.find("span").get_text()

所有注释都在类commentMessage的div中，包括不必要的注释“[[commentMessage]]”。

Answer 1

一个简单的if if do

for element in comments:
    text = element.find("span").get_text()
    if "[[commentMessage]]" not in text:
        print text

从python / BeautifulSoup中的print语句中过滤掉一个字符串

1 个答案: