我试过这个:
s = soup.findAll("table", {"class": "view"})
但它给了我一张桌子。但是我需要桌子里的桌子。
<table class="view" >
<tr>
<td width="46%" valign="top">
<table>
<tr>
<td>
<div style="adasdasd">
<div class="abc">dasdsadasdasdas</div>
</div>
<div>
<span><span class="aaaaaaa " title="aaaaaaaaaaa"><span>aaaaaaaaaaaaa</span></span> </span>
<b>My Face</b><br />
Hello This is me,
</div>
<div class="abc"">
Dec 6, 2010 by Alis
</div>
</td>
</tr>
</table>
</tr>
</table>
The things I want to scrap is:
Hello This is me,
My Face
Dec 6, 2010 by Alis
答案 0 :(得分:1)
s = soup.findAll("table", {"class": "view"})[0].find("table")
如果只有一个表格,您也可以使用.find
作为第一个表格,然后放弃[0]
。
答案 1 :(得分:1)
继承了一些更好的格式化html:
<table class="view" >
<tr>
<td width="46%" valign="top">
<table>
<tr>
<td>
<div style="adasdasd">
<div class="abc">dasdsadasdasdas</div>
</div>
<div>
<span>
<span class="aaaaaaa " title="aaaaaaaaaaa">
<span>aaaaaaaaaaaaa</span>
</span>
</span>
<b>My Face</b>
<br />
Hello This is me,
</div>
<div class="abc">
Dec 6, 2010 by Alis
</div>
</td>
</tr>
</table>
</td>
</tr>
</table>
注意:我实际上添加了一个标签,因为它丢失了一个。
innerTable = soup.find("table", {"class": "view"}).tr.td.table ##Gets the table in the first cell of the first row
innerDiv = innerTable.find("div", {"style": "adasdasd"}).nextSibling #this gets the div in which all of you content resides
这将使您获得包含所有内容的内容。从那里只需要一点点解析来获得你真正需要的内容。