Question

我正在尝试废除年份＆amp;来自“决赛名单匹配”表（第二张表）的获奖者（第一和第二列） http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals：我正在使用以下代码：

import urllib2
from BeautifulSoup import BeautifulSoup

url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm"
soup = BeautifulSoup(urllib2.urlopen(url).read())
soup.findAll('table')[0].tbody.findAll('tr')
for row in soup.findAll('table')[0].tbody.findAll('tr'):
    first_column = row.findAll('th')[0].contents
    third_column = row.findAll('td')[2].contents
    print first_column, third_column

通过上面的代码，我得到了第一个＆amp; thrid专栏就好了。但是当我使用与http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals相同的代码时，它无法找到tbody作为其元素，但是当我检查元素时，我可以看到tbody。

url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals"
soup = BeautifulSoup(urllib2.urlopen(url).read())

print soup.findAll('table')[2]

    soup.findAll('table')[2].tbody.findAll('tr')
    for row in soup.findAll('table')[0].tbody.findAll('tr'):
        first_column = row.findAll('th')[0].contents
        third_column = row.findAll('td')[2].contents
        print first_column, third_column

以下是我从评论错误中获得的信息：

'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-150-fedd08c6da16> in <module>()
      7 # print soup.findAll('table')[2]
      8 
----> 9 soup.findAll('table')[2].tbody.findAll('tr')
     10 for row in soup.findAll('table')[0].tbody.findAll('tr'):
     11     first_column = row.findAll('th')[0].contents

AttributeError: 'NoneType' object has no attribute 'findAll'

'

Answer 1

如果您在浏览器中检查检查工具，它将插入tbody标签。

源代码可能包含也可能不包含它们。如果你真的想知道，我建议查看源视图。

无论哪种方式，您都不需要遍历tbody，只需：

soup.findAll('table')[0].findAll('tr')应该有用。

Answer 2

url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals"
soup = BeautifulSoup(urllib2.urlopen(url).read())
for tr in soup.findAll('table')[2].findAll('tr'):
    #get data

然后在表格中搜索您需要的内容：）

Answer 3

直接运行以下代码。

tr_elements = soup.find_all('table')[2].find_all('tr')

这样做，您可以访问所有<tr>；您将必须使用for循环执行此操作（也有其他可能的方法来进行迭代）。不要尝试找到正文，它会默认添加。

注意：

如果在获取所需标签时遇到问题，请使用.decompose()方法分解先前的标签。

如何从python美丽的汤中获取桌子？

3 个答案: