我在解析html时遇到问题。我正在使用一个网站,该网站的列表中有一些具有不同类名的项目。我想做的就是在单个findAll中找到它们,就像这样:
page_soup.findAll("li", {"Class" : "Class1" or "Class2"})
我想在班级之间使用“或”。
示例html:
<ol class="products-list" id="products">
<li class="item odd">
</li>
<li class="item even">
</li>
<li class="item last even">
</li>
</ol>
答案 0 :(得分:1)
使用比Select
()更快的findAll
()
page_soup=BeautifulSoup(html,'html.parser')
for item in page_soup.select(".odd,.even"):
print(item)
此处的代码:
from bs4 import BeautifulSoup
html='''<ol class="products-list" id="products">
<li class="item odd">
</li>
<li class="item even">
</li>
<li class="item last even">
</li>
</ol>
'''
page_soup=BeautifulSoup(html,'html.parser')
for item in page_soup.select(".odd,.even"):
print(item)
答案 1 :(得分:0)
完整的工作样本:
from bs4 import BeautifulSoup
text = """
<body>
<ul>
<li class="Class1">Class 1</li>
<li class="Class2">Class 2</li>
<div class="Class1 special">Class 1 in div</div>
<div class="Class2 special">Class2 in div</div>
</ul>
</body>"""
soup = BeautifulSoup(text,"lxml")
result = soup.find_all(lambda tag: tag.name == 'li' and
( tag.get('class') == ['Class1'] or tag.get('class') == ['Class2'] ))
print(result)