我有一个包含多个表的页面。我试图获得一个名为“TabBox”的表,但它似乎抓住了一个名为“TabBox2”的进程。有什么想法吗?
有一个“TabBox2”包装两个表。似乎它正在搜索“TabBox”的第一个实例,无论它被命名为“TabBox2”还是只是“TabBox”。
table = soup.find("table", { "class" : "GroupBox3" })
rows = table.find_all("tr")
table2 = soup.find("table", { "class" : "TabBox" })
rows2 = table.find_all("tr")
rows2应该= table2.find
谢谢Game Braniac!
<br />
<table cellspacing="0" cellpadding="4" border="1" class="GroupBox1">
<tbody><tr>
<th><h3>Completion Information</h3></th>
</tr>
<tr>
<td><table width="578" cellspacing="0" cellpadding="4" border="1" class="GroupBox3">
<tbody><tr>
<th width="31%">Well Status Code</th>
<th width="17%" nowrap="nowrap"><div align="center"><strong>Spud Date</strong></div></th>
<th width="28%" nowrap="nowrap"><div align="center">Drilling Completed</div></th>
<th width="24%" nowrap="nowrap"><div align="center">Surface Casing Date</div></th>
</tr>
<tr>
<td nowrap="nowrap">W - Final Completion</td>
<td><div align="center">12/08/2011</div></td>
<td><div align="center">02/14/2012</div></td>
<td><div align="center">12/09/2011</div></td>
</tr>
</tbody></table></td>
</tr>
<tr>
<td><table cellspacing="0" cellpadding="4" border="1" class="TabBox">
<tbody><tr>
<th width="155" nowrap="nowrap">Field Name</th>
<th width="142" nowrap="nowrap">Completed Well Type</th>
<th width="108" nowrap="nowrap"><div align="center">Completed Date</div></th>
<th width="133" nowrap="nowrap"><div align="center">Validated Date</div></th>
</tr>
<tr>
<td nowrap="nowrap">
WOLFBONE (TREND AREA)
</td>
<td nowrap="nowrap"><div align="center">Oil</div>
</td>
<td nowrap="nowrap"><div align="center">02/14/2012</div>
</td>
<td nowrap="nowrap"><div align="center">06/04/2013</div>
</td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
<br />
答案 0 :(得分:1)
尝试以下方法:
from bs4 import BeautifulSoup
import re
html = r"""
<br />
<table cellspacing="0" cellpadding="4" border="1" class="GroupBox1">
<tbody><tr>
<th><h3>Completion Information</h3></th>
</tr>
<tr>
<td><table width="578" cellspacing="0" cellpadding="4" border="1" class="GroupBox3">
<tbody><tr>
<th width="31%">Well Status Code</th>
<th width="17%" nowrap="nowrap"><div align="center"><strong>Spud Date</strong></div></th>
<th width="28%" nowrap="nowrap"><div align="center">Drilling Completed</div></th>
<th width="24%" nowrap="nowrap"><div align="center">Surface Casing Date</div></th>
</tr>
<tr>
<td nowrap="nowrap">W - Final Completion</td>
<td><div align="center">12/08/2011</div></td>
<td><div align="center">02/14/2012</div></td>
<td><div align="center">12/09/2011</div></td>
</tr>
</tbody></table></td>
</tr>
<tr>
<td><table cellspacing="0" cellpadding="4" border="1" class="TabBox">
<tbody><tr>
<th width="155" nowrap="nowrap">Field Name</th>
<th width="142" nowrap="nowrap">Completed Well Type</th>
<th width="108" nowrap="nowrap"><div align="center">Completed Date</div></th>
<th width="133" nowrap="nowrap"><div align="center">Validated Date</div></th>
</tr>
<tr>
<td nowrap="nowrap">
WOLFBONE (TREND AREA)
</td>
<td nowrap="nowrap"><div align="center">Oil</div>
</td>
<td nowrap="nowrap"><div align="center">02/14/2012</div>
</td>
<td nowrap="nowrap"><div align="center">06/04/2013</div>
</td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
<br />
"""
soup = BeautifulSoup(html)
tab_box = soup.findAll('table', {'class': 'TabBox'})
for var in tab_box:
print var