我正在尝试使用bs4,但是,我在从以下html中提取一些信息时遇到了一些麻烦:
<table border="1" cellspacing="0" class="browser">
<thead>..</thead>
<tbody class="body">
<tr class="date">..</tr>
<tr class="right">..</tr>
<tr class="right">..</tr>
<tr class="right">..</tr>
<tr class="date">..</tr>
<tr class="right">..</tr>
<tr class="right">..</tr>
<tr class="right">..</tr>
</tbody>
</table>
所以,我想要的是两个class
之间的内容(date classes
),如下所示:
<tr class="date">..</tr>
<tr class="right">..</tr>
<tr class="right">..</tr>
<tr class="right">..</tr>
和
<tr class="date">..</tr>
<tr class="right">..</tr>
<tr class="right">..</tr>
<tr class="right">..</tr>
我尝试过:
xx = soup.find_all('tbody',{'class':'body'})
并获取相应的right classes
我这样做:
yy = []
for i in xx:
yy.append( i.find_all('tr',{'class':'right'}) )
...但这给了我所有的right classes
,但我想知道date
中每个元素的父yy
类是什么。简而言之,我希望每个right classes
与其parent date class
如果问题似乎令人困惑,请提前抱歉!
答案 0 :(得分:1)
您必须遍历tbody
标记的子项。这将有效:
# Get just the tags
tags = filter( lambda x: x != '\n', soup.tbody.contents)
collected_tags = []
latest_date = None
for tag in tags:
if tag['class'] == ['date']:
date_map = {tag: []}
collected_tags.append(date_map)
latest_date = tag
continue
if collected_tags and tag['class'] == ['right']:
collected_tags[-1][latest_date].append(tag)
```
collected_tags
现在是将date
代码映射到right
代码的词典列表。
答案 1 :(得分:0)
您可以迭代next_siblings
,直到找到一个date
作为类的内容:
for date_row in soup.select('table tbody.body tr.date'):
for elem in date_row.next_siblings:
if not elem.name:
# NavigableString (text) element between rows
continue
if 'right' not in elem.get('class', []):
# all done, found a row that doesn't have class="right"
break
您可以将这些收集到一个列表中,或者只是在那里循环处理它们。
演示:
>>> for date_row in soup.select('table tbody.body tr.date'):
... print('Found a date row', date_row)
... for elem in date_row.next_siblings:
... if not elem.name:
... # NavigableString (text) element between rows
... continue
... if 'right' not in elem.get('class', []):
... # all done, found a row that doesn't have class="right"
... break
... print('Right row grouped with the date', elem)
... print()
...
Found a date row <tr class="date">..</tr>
Right row grouped with the date <tr class="right">..</tr>
Right row grouped with the date <tr class="right">..</tr>
Right row grouped with the date <tr class="right">..</tr>
Found a date row <tr class="date">..</tr>
Right row grouped with the date <tr class="right">..</tr>
Right row grouped with the date <tr class="right">..</tr>
Right row grouped with the date <tr class="right">..</tr>