BeautifulSoup和类空间

时间:2017-10-12 20:29:52

标签: python beautifulsoup

使用BeautifulSoul和Python,我希望find_all匹配给定类属性的所有tr项,其中包含多个名称,如下所示:

<tr class="admin-bookings-table-row bookings-history-row  paid   ">

我尝试了几种匹配该类的方法。正则表达式,通配符,但我总是得到一个空列表。

有没有办法使用正则表达式,通配符或如何匹配这个类?

发布了相同的问题here但没有回答。

3 个答案:

答案 0 :(得分:7)

您可以使用css selector来匹配多个类:

from bs4 import BeautifulSoup as soup
html = '''
<tr class="admin-bookings-table-row bookings-history-row  paid   "></tr>
<tr class="admin-bookings-table-row  nope  paid   "></tr>
'''
soup = soup(html, 'lxml')

res = soup.select('tr.admin-bookings-table-row.bookings-history-row.paid')
print(res)

>>> [<tr class="admin-bookings-table-row bookings-history-row paid "></tr>]

否则,也许这个答案也可以帮到你: https://stackoverflow.com/a/46719501/6655211

答案 1 :(得分:2)

HTML类不能包含空格。这个元素有多个类。

通过这些类中的任何一个进行搜索:

from bs4 import BeautifulSoup

html = '<tr id="history_row_938220" style="" class="admin-bookings-table-row bookings-history-row  paid   ">'


soup = BeautifulSoup(html, 'html.parser')

print(soup.find_all(attrs={'class': 'admin-bookings-table-row'}))
print(soup.find_all(attrs={'class': 'bookings-history-row'}))
print(soup.find_all(attrs={'class': 'paid'}))

所有输出

[<tr class="admin-bookings-table-row bookings-history-row paid " 
 id="history_row_938220" style=""></tr>]

答案 2 :(得分:1)

  

我希望find_all包含给定类的所有tr项   多个空间。

多个空格实际上表示标记中有多个类。您可以对具有多个类的tr标记进行过滤,如下所示:

html_doc = """
<html><head><title>a title here</title></head>
<body>
<tr class="admin-bookings-table-row bookings-history-row  paid   " id="link1">Elsie</tr>,
<tr class="oneclass" id="link2">Lacie</tr>
<tr class="tag1 tag2" id="link3">Tillie</tr>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
filt = [tag for tag in soup.find_all('tr') if len(tag.get('class')) > 1]

filt  # Only 2 of 3 tags returned--excludes tag with just 1 class
# [<tr class="admin-bookings-table-row bookings-history-row paid " id="link1">Elsie</tr>,
#  <tr class="tag1 tag2" id="link3">Tillie</tr>]

或者,使用lambda:

soup.find_all(lambda tag: tag.name=='tr' and len(tag.get('class')) > 1)