Question

我想提取＆＃34;提交它的数量＆＃34;在网页中的标记之间。这是我的代码。

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("https://www.crummy.com/software/BeautifulSoup/")
bsObj = BeautifulSoup(html, "html.parser")

nameList = bsObj.findAll(text="file it")
print(len(nameList))

如果＆＃34;提交它＆＃34;或＆＃34;下载＆＃34;，它与结果1配合得很好。在＆＃34;名人堂＆＃34;的情况下，它与结果2一起使用。

但是对于＆＃34;讨论组＆＃34;，它应该是2，但它不起作用，结果是0。

为什么我在＆＃34;讨论组＆＃34;中得到结果0？ case或＆＃34;获取源代码＆＃34;情况？

Answer 1

import re
nameList = bsObj.findAll(text=re.compile(r"the\s+discussion\sgroup"))

在正则表达式中使用\s+来匹配包括\n

在内的所有空格

python beautifulsoup提取标签之间的外观数量

1 个答案: