漂亮的汤文档提供了属性.contents和.children来访问给定标记的子元素(分别是列表和迭代),并包括Navigable Strings和Tags。我只想要Tag类型的孩子。
我目前正在使用列表理解来完成此任务:
rows=[x for x in table.tbody.children if type(x)==bs4.element.Tag]
但我想知道是否有更好/更多的pythonic /内置方式来获得Tag儿童。
答案 0 :(得分:14)
感谢J.F.Sebastian,以下内容将起作用:
rows=table.tbody.find_all(True, recursive=False)
此处的文档:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#true
在我的情况下,我需要表中的实际行,所以我最终使用了以下内容,这更精确,我认为更具可读性:
rows=table.tbody.find_all('tr')
再次,文档:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigating-using-tag-names
我认为这比迭代Tag的所有子项更好。
使用以下输入:
<table cellspacing="0" cellpadding="0">
<thead>
<tr class="title-row">
<th class="title" colspan="100">
<div style="position:relative;">
President
<span class="pct-rpt">
99% reporting
</span>
</div>
</th>
</tr>
<tr class="header-row">
<th class="photo first">
</th>
<th class="candidate ">
Candidate
</th>
<th class="party ">
Party
</th>
<th class="votes ">
Votes
</th>
<th class="pct ">
Pct.
</th>
<th class="change ">
Change from ‘08
</th>
<th class="evotes last">
Electoral Votes
</th>
</tr>
</thead>
<tbody>
<tr class="">
<td class="photo first">
<div class="photo_wrap"><img alt="P-barack-obama" height="48" src="http://i1.nyt.com/projects/assets/election_2012/images/candidate_photos/election_night/p-barack-obama.jpg?1352320690" width="68" /></div>
</td>
<td class="candidate ">
<div class="winner dem"><img alt="Hp-checkmark@2x" height="9" src="http://i1.nyt.com/projects/assets/election_2012/images/swatches/hp-checkmark@2x.png?1352320690" width="10" />Barack Obama</div>
</td>
<td class="party ">
Dem.
</td>
<td class="votes ">
2,916,811
</td>
<td class="pct ">
57.3%
</td>
<td class="change ">
-4.6%
</td>
<td class="evotes last">
20
</td>
</tr>
<tr class="">
<td class="photo first">
</td>
<td class="candidate ">
<div class="not-winner">Mitt Romney</div>
</td>
<td class="party ">
Rep.
</td>
<td class="votes ">
2,090,116
</td>
<td class="pct ">
41.1%
</td>
<td class="change ">
+4.3%
</td>
<td class="evotes last">
0
</td>
</tr>
<tr class="">
<td class="photo first">
</td>
<td class="candidate ">
<div class="not-winner">Gary Johnson</div>
</td>
<td class="party ">
Lib.
</td>
<td class="votes ">
54,798
</td>
<td class="pct ">
1.1%
</td>
<td class="change ">
–
</td>
<td class="evotes last">
0
</td>
</tr>
<tr class="last-row">
<td class="photo first">
</td>
<td class="candidate ">
div class="not-winner">Jill Stein</div>
</td>
<td class="party ">
Green
</td>
<td class="votes ">
29,336
</td>
<td class="pct ">
0.6%
</td>
<td class="change ">
–
</td>
<td class="evotes last">
0
</td>
</tr>
<tr>
<td class="footer" colspan="100">
<a href="/2012/results/president">President Map</a> |
<a href="/2012/results/president/big-board">President Big Board</a> |
<a href="/2012/results/president/exit-polls?state=il">Exit Polls</a>
</td>
</tr>
</tbody>
</table>