Question

我试图找到一个标签。此标记的类包含以下子字符串： borderbox flightbox p2 。

例如：<div class="borderbox flightbox p2 my-repeat-animation ng-scope"...

所以我认为这应该有效：

soup.find_all('div',class_=re.compile(r"borderbox flightbox p2"+".*"))

但它无法找到任何东西。你有什么建议吗？

Answer 1

BeautifulSoup相当于re.search()，而不是re.match()

尝试

soup.find_all('div', class_=re.compile('borderbox flightbox p2 \d+'))

Answer 2

这应该做你想要的：

def match_tag(tag, classes):
    return (tag.name == 'div'
            and 'class' in tag.attrs
            and all([c in tag['class'] for c in classes]))

divs = soup.find_all(lambda t: match_tag(t, ['borderbox', 'flightbox', 'p2'))

在BeautifulSoup 4中，传递给class_参数的正则表达式分别应用于每个CSS类。 BeautifulSoup正在检查你的div所持有的每个CSS类，看看它是否与你给它的正则表达式相匹配。把它放在代码中，它就是这样的：

for class in div['class']:
    if regexp.search(class): yield div

当然，没有个别课程可以匹配你的正则表达式;在'borderbox flightbox p2'，'borderbox'或'flightbox'中找不到'p2'。

解决方案是使用BeautifulSoup的功能来为您进行匹配。 match_tag检查（1）标记是div和（2）标记是否包含参数classes指定的每个CSS类。

美丽汤中的正则表达式不起作用

2 个答案: