如何使用美丽的汤解析表内的表?

时间:2010-11-12 11:52:04

标签: python beautifulsoup

我试过这个: s = soup.findAll("table", {"class": "view"})但它给了我一张桌子。但是我需要桌子里的桌子。

<table class="view" >
    <tr>
        <td width="46%" valign="top">
        <table>
    <tr>
        <td>
            <div style="adasdasd">
                <div class="abc">dasdsadasdasdas</div>
            </div>
            <div>
                <span><span class="aaaaaaa " title="aaaaaaaaaaa"><span>aaaaaaaaaaaaa</span></span> </span>
                <b>My Face</b><br />
                    Hello This is me,
                </div>
            <div class="abc"">
                    Dec 6, 2010 by Alis
                </div>
        </td>
    </tr>
        </table>
    </tr>
    </table>

The things I want to scrap is:

    Hello This is me,

    My Face

    Dec 6, 2010 by Alis

2 个答案:

答案 0 :(得分:1)

s = soup.findAll("table", {"class": "view"})[0].find("table")

如果只有一个表格,您也可以使用.find作为第一个表格,然后放弃[0]

答案 1 :(得分:1)

继承了一些更好的格式化html:

<table class="view" >
    <tr>
        <td width="46%" valign="top">
            <table>
                <tr>
                    <td>
                        <div style="adasdasd">
                            <div class="abc">dasdsadasdasdas</div>
                        </div>
                        <div>
                            <span>
                                <span class="aaaaaaa " title="aaaaaaaaaaa">
                                    <span>aaaaaaaaaaaaa</span>
                                </span>
                            </span>
                            <b>My Face</b>
                            <br />
                            Hello This is me,
                        </div>
                        <div class="abc">
                            Dec 6, 2010 by Alis
                        </div>
                    </td>
                </tr>
            </table>
        </td>
    </tr>
</table>

注意:我实际上添加了一个标签,因为它丢失了一个。

innerTable = soup.find("table", {"class": "view"}).tr.td.table ##Gets the table in the first cell of the first row

innerDiv = innerTable.find("div", {"style": "adasdasd"}).nextSibling #this gets the div in which all of you content resides

这将使您获得包含所有内容的内容。从那里只需要一点点解析来获得你真正需要的内容。