Question

我试图从这个网页中抢出桌子。我不确定我是否抓住了正确的标签。这是我到目前为止所拥有的。

from bs4 import BeautifulSoup
import requests

page='http://www.airchina.com.cn/www/en/html/index/ir/traffic/'

r=requests.get(page)

soup=BeautifulSoup(r.text)

test=soup.findAll('div', {'class': 'main noneBg'})
rows=test.findAll("td")

桌子是main noneBg吗？当我将鼠标悬停在该标签上时，它会突出显示该表吗？

Answer 1

您需要的表格位于从其他网址加载的iframe中。

以下是您如何抓住它（观看网址不同）：

from bs4 import BeautifulSoup
import requests

page = 'http://www.airchina.com.cn/www/jsp/airlines_operating_data/exlshow_en.jsp'

r = requests.get(page)

soup = BeautifulSoup(r.text)

div = soup.find('div', class_='mainRight').find_all('div')[1]
table = div.find('table', recursive=False)
for row in table.find_all('tr', recursive=False):
    for cell in row('td', recursive=False):
        print cell.text.strip()

打印：

Feb 2014
% change vs Feb 2013
% change vs Jan 2014
Cumulative Feb 2014
% cumulative change
1.Traffic
1.RTKs (in millions)
1407.8
...

请注意，由于页面上的嵌套表，您需要使用recursive=False。

Python beautifulsoup抢表

1 个答案: