找到所有div标签find class =" post - "接下来是一些数字?

时间:2016-08-23 14:06:19

标签: python regex beautifulsoup

我想找到所有带有class =&#34的div标签;将某些数字发布到某些文本" 有多个div标签,例如。

<div class="post-3562 some text">
<div class="post-some text">
<div class="post-some text">
<div class="post-1324 some text">
<div class="post-4540 some text">
<div class="post-some text">
<div class="post-1122 some text">

我只想获得带有class =&#34; div-some number&#34;

的div标签

目前我写的是:

allPostsDiv = soup.find_all("div", class_= "post")

有没有办法实现我想做的事情?可能使用正则表达式会有帮助吗? 任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:3)

您可以将正则表达式作为class_参数的值传递,如下所示:

soup.find_all(name='div', class_=re.compile(r'^post-\d+$'))

完整计划:

from bs4 import BeautifulSoup
import re

soup = BeautifulSoup('''
<root>
<div class="post-3562 some text"/>
<xdiv class="post-9999 some text"/>
<div class="post-some text"/>
<div class="post-some text"/>
<div class="post-1324some text"/>
<div class="some post-4540 text"/>
<div class="post-some text"/>
<div class="some text post-1122"/>
</root>''', 'html.parser')

for div in soup.find_all(name='div', class_=re.compile(r'^post-\d+$')):
    print div

结果:

<div class="post-3562 some text"></div>
<div class="some post-4540 text"></div>
<div class="some text post-1122"></div>

答案 1 :(得分:-1)

以下正则表达式将匹配您的测试用例:

/<div +class= *"post-\d+.*>/g

Regex Tester链接:https://regex101.com/r/cX1qZ7/1