Question

我正在尝试从有船舶数据库的网站收集信息。

我试图通过BeautifulSoup获取信息。但目前它似乎没有起作用。我尝试在网上搜索并尝试了不同的解决方案，但没有设法让代码正常工作。

我想知道我必须改变 table = soup.find_all("table", { "class" : "table1" }) ---行，因为有5个表class='table1'，但我的代码只找到1。

我是否必须为表创建一个循环？当我尝试这个时，我无法让它发挥作用。另外一行table_body = table.find('tbody')也会出错：

AttributeError: 'ResultSet' object has no attribute 'find'

这应该是BeautifulSoup的源代码，ResultSet子类列表和我的代码之间的冲突。我是否必须遍历该列表？

from urllib import urlopen

shipUrl = 'http://www.veristar.com/portal/veristarinfo/generalinfo/registers/seaGoingShips?portal:componentId=p_efff31ac-af4c-4e89-83bc-55e6d477d131&interactionstate=JBPNS_rO0ABXdRAAZudW1iZXIAAAABAAYwODkxME0AFGphdmF4LnBvcnRsZXQuYWN0aW9uAAAAAQAYc2hpcFNlYXJjaFJlc3VsdHNTZXRTaGlwAAdfX0VPRl9f&portal:type=action&portal:isSecure=false'
shipPage = urlopen(shipUrl)

from bs4 import BeautifulSoup
soup = BeautifulSoup(shipPage)
table = soup.find_all("table", { "class" : "table1" })
print table
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for tr in rows:
    cols = tr.find_all('td')
    for td in cols:
        print td
    print

Answer 1

有几件事：

正如凯文所说，你需要使用for循环来遍历find_all返回的列表。

并非所有表都有tbody，因此您必须将find的结果包装在try块中。

执行print时，您希望使用.text方法，以便打印文本值而不是标记本身。

以下是修订后的代码：

shipUrl = 'http://www.veristar.com/portal/veristarinfo/generalinfo/registers/seaGoingShips?portal:componentId=p_efff31ac-af4c-4e89-83bc-55e6d477d131&interactionstate=JBPNS_rO0ABXdRAAZudW1iZXIAAAABAAYwODkxME0AFGphdmF4LnBvcnRsZXQuYWN0aW9uAAAAAQAYc2hpcFNlYXJjaFJlc3VsdHNTZXRTaGlwAAdfX0VPRl9f&portal:type=action&portal:isSecure=false'
shipPage = urlopen(shipUrl)

soup = BeautifulSoup(shipPage)
table = soup.find_all("table", { "class" : "table1" })
for mytable in table:
    table_body = mytable.find('tbody')
    try:
        rows = table_body.find_all('tr')
        for tr in rows:
            cols = tr.find_all('td')
            for td in cols:
                print td.text
    except:
        print "no tbody"

产生以下输出：

Register Number:
08910M
IMO Number:
9365398
Ship Name:
SUPERSTAR
Call Sign:
ESIY
.....

Python BeautifulSoup从网页上刮取表格

1 个答案: