使用BeautifulSoup读取外部表的行

时间:2015-09-23 07:46:04

标签: python beautifulsoup

DiscussionParticipation

我有一些HTML,并且有上面的表格内容。在一张桌子旁边还有一些桌子。

当我使用DiscussionParticipation: User_id: 1 | Discussion_id: 1 User_id: 2 | Discussion_id: 1

阅读 <table> <tbody> <tr> <td>Some Content </td> <td>Some Content </td> </tr> <tr> <td>Some Content </td> <td>Some Content </td> </tr> <tr> <table> <tbody> <tr> <td>Some Content </td> <td>Some Content </td> </tr> <tr> <td>Some Content </td> <td>Some Content </td> </tr> <tr> <td>Some Content </td> <td>Some Content </td> </tr> </tbody> </table> </tr> </tbody> <table>
tr

内部表的行也被读取。

beautifulsoup

如何只读取外表的行?

2 个答案:

答案 0 :(得分:2)

您可以使用[x.string for x in soup.select('table > tbody > tr > td') if x not in soup.select('table > tbody > tr > table > tbody > tr > td')] parameter完成此操作,如下所示:

recursive=False

返回3.

答案 1 :(得分:0)

您可以尝试:

<ul> <!--beginning of outer list -->
    <li>
        First line of outline, outer list
    </li>
    <li>
        Second line of outline, outer list
        <ul> <!--beginning of first nested list-->
            <li>
                First line of first nested list
            </li>
            <li>
                Second line of 1st nested list
                <ul> <!-- beginning 2nd nested list -->
                    <li>
                        First line of 2nd nested list
                    </li>
                </ul>  <!-- end of 2nd nested list -->
            </li>  <!-- end of list item in which 2nd nested list exists
        </ul>  <!-- end of 1st nested list-->
    </li> <!-- end of of list item which contains 1st nested list -->
    <li>
        Third line of Outline, non nested
    </li>
<ul> <!-- end outer list -->

结果:只有外表td内容。注意:如果外部和内部td相等,它将返回空列表。