我想选择所有<div>
,其中类名是post has-profile bg2
或post has-profile bg1
但不是最后一个,即panel
<div id="6" class="post has-profile bg2"> some text 1 </div>
<div id="7" class="post has-profile bg1"> some text 2 </div>
<div id="8" class="post has-profile bg2"> some text 3 </div>
<div id="9" class="post has-profile bg1"> some text 4 </div>
<div class="panel bg1" id="abc"> ... </div>
select()
仅匹配单个匹配项。我正在使用find_all()
进行尝试,但是bs4找不到它。
if soup.find(class_ = re.compile(r"post has-profile [bg1|bg2]")):
posts = soup.find_all(class_ = re.compile(r"post has-profile [bg1|bg2]"))
如何使用正则表达式和不使用正则表达式来解决?谢谢。
答案 0 :(得分:1)
您可以在BeautifulSoup中使用内置的CSS选择器:
data = """<div id="6" class="post has-profile bg2"> some text 1 </div>
<div id="7" class="post has-profile bg1"> some text 2 </div>
<div id="8" class="post has-profile bg2"> some text 3 </div>
<div id="9" class="post has-profile bg1"> some text 4 </div>
<div class="panel bg1" id="abc"> ... </div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'lxml')
divs = soup.select('div.post.has-profile.bg2, div.post.has-profile.bg1')
for div in divs:
print(div)
print('-' * 80)
打印:
<div class="post has-profile bg2" id="6"> some text 1 </div>
--------------------------------------------------------------------------------
<div class="post has-profile bg2" id="8"> some text 3 </div>
--------------------------------------------------------------------------------
<div class="post has-profile bg1" id="7"> some text 2 </div>
--------------------------------------------------------------------------------
<div class="post has-profile bg1" id="9"> some text 4 </div>
--------------------------------------------------------------------------------
'div.post.has-profile.bg2, div.post.has-profile.bg1'
选择器选择类<div>
的所有"post hast-profile bg2"
标签和类<div>
的所有"post hast-profile bg1"
标签。
答案 1 :(得分:1)
您可以定义一个描述感兴趣标签的函数:
let num1: number = 01;
并将该功能应用于“汤”:
def test_tag(tag):
return tag.name=='div' \
and tag.has_attr('class') \
and "post" in tag['class'] \
and "has-profile" in tag['class'] \
and ("bg1" in tag['class'] or "bg2" in tag['class']) \
and "panel" not in tag['class']
答案 2 :(得分:0)
使用正则表达式。
尝试:
from bs4 import BeautifulSoup
import re
s = """<div id="6" class="post has-profile bg2"> some text 1 </div>
<div id="7" class="post has-profile bg1"> some text 2 </div>
<div id="8" class="post has-profile bg2"> some text 3 </div>
<div id="9" class="post has-profile bg1"> some text 4 </div>
<div class="panel bg1" id="abc"> ... </div>"""
soup = BeautifulSoup(s, "html.parser")
for i in soup.find_all("div", class_=re.compile(r"post has-profile bg(1|2)")):
print(i)
输出:
<div class="post has-profile bg2" id="6"> some text 1 </div>
<div class="post has-profile bg1" id="7"> some text 2 </div>
<div class="post has-profile bg2" id="8"> some text 3 </div>
<div class="post has-profile bg1" id="9"> some text 4 </div>