Python BeautifulSoup:迭代表

时间:2014-08-08 21:08:05

标签: python beautifulsoup html-table

我想迭代每个TR标签的每个TD。所以,例如,如果我得到所有表格:

trList = tbody.findAll('tr')

后来我想分别得到每个TR元素的所有TD标签。

类似的东西:

trList[0]:
  td[0]
  td[1] # I wanted to get this TD of every TR
  td[2]

trList[1]:
  td[0]
  td[1] # this one as well
  td[2]

在正常情况下,我会使用嵌套循环来获取它。

有可能吗?

2 个答案:

答案 0 :(得分:3)

nth-of-type CSS selector会有所帮助:

from bs4 import BeautifulSoup


data = """
<table>
    <tr>
        <td>1</td>
        <td>2</td>
        <td>3</td>
    </tr>

    <tr>
        <td>4</td>
        <td>5</td>
        <td>6</td>
    </tr>

    <tr>
        <td>7</td>
        <td>8</td>
        <td>9</td>
    </tr>
</table>
"""


soup = BeautifulSoup(data)
for td in soup.select('table > tr > td:nth-of-type(2)'):
    print td.text

打印:

2
5
8

答案 1 :(得分:3)

是的,您可以使用相同的功能findAll

trList = tbody.findAll('tr')
for tr in trList:
    tdList = tr.findAll('td')
    for td in tdList:
        // here you got each td