Question

我需要从页面获取表格，没有像唯一类一样的唯一标识符。该页面中有多个表格：

...
</table>
</font></td>
<td width="18"></td>
<td width="208">
<table>
    <TD><b>APPLE Smart History</b> Table</TD></TR>
    <tr><td><i><b>Date</b></i></td><td><i><b>Ratio</b></i></td></tr>
    <TR><TD>05/31/1994</TD><TD>3 over 1</TD></TR>
    <TR><TD>01/21/2004</TD><TD>2 over 1</TD></TR>
    <TR><TD>10/28/2008</TD><TD>5 over 4</TD></TR>
</table>
<table width="208" cellspacing="0" cellpadding="0" style="margin-bottom: 18px">
...

我需要选择第一列<b>APPLE Smart History</b> Table的表，其中APPLE可能是随机字符串。所以固定是Smart History</b> Table，永远不会改变其他修正的方法是，该表中只有两列具有日期和比率。
我需要获取此表中每行的日期和比率。如果我可以迭代行，那就最好了：

for row in rows:
    pass

知道BeautifulSoup中最好的选择器是什么来获取这些数据？

Answer 1

我们的想法是通过检查b元素（我们将使用function），然后locate the first parent table元素来查找b = soup.find("b", text=lambda x: x and x.endswith("Smart History")) table = b.find_parent("table") rows = table.find_all("tr") for row in rows: # do smth with row元素，然后查找内部的所有行：

import dropbox
client = dropbox.client.DropboxClient('<token>')
f = open('/ssd-scratch/abhishekb/try/1.mat', 'rb')
response = client.put_file('/data/1.mat', f)

beautifulsoup表中没有唯一标识符

1 个答案: