Question

我目前正在使用Selenium和Beautiful汤来抓取网站上的所有HTML数据。目前，所有数据都存储在Python中的变量中。

soup = BeautifulSoup(driver.page_source, 'lxml')

查找两个不同单词的最佳方式是什么，正好是＆＃34; Open＆＃34;或完全＆＃34;已关闭＆＃34;然后按照查找顺序将它们打印到控制台。

我尝试了以下内容：

for node in soup.find_all(text=lambda x: x and "Open" in x):
print(node)

但我怎样才能让它准确地搜索＆＃34;已关闭＆＃34;

我更新的代码：

soup = BeautifulSoup(driver.page_source, 'lxml')

status = soup.find('div', attrs={"class":"pagebodydiv"})

with open("status.txt", "w") as file:
    for node in status.find_all(text=lambda t: t in ('Open', 'Closed')):
        file.write(node.encode("utf-8")+"\n")

Answer 1

您可以在此处使用any()。

for node in soup.find_all(text=lambda t: t and any(x in t for x in ['Open', 'Closed'])):
    print(node)

这将作为一般解决方案很有用。如果您要搜索更多单词，只需将其添加到列表中即可。

如果您想了解any()的作用，请查看documentation：

<强>任何（的迭代的的）：


如果 iterable 的任何元素为true，则返回True。如果     iterable为空，返回False。相当于：

def any(iterable): for element in iterable: if element: return True return False

修改：如果要搜索包含指定字词的句子，请使用上述解决方案。但是，如果您想匹配确切的文本（如编辑问题中所述），您可以使用@Jatimir mentioned in the comments：

for node in soup.find_all(text=lambda t: t in ('Open', 'Closed')): print(node)

BeautifulSoup找到两个不同的字符串

1 个答案: