我正在尝试从airport website中提取包含航班到达信息的表格(包含列 - 航班,航空公司,航班,日期,预定,估计,状态),但我收到以下错误:
IndexError Traceback (most recent call last)
<ipython-input-39-2f7369a95ba9> in <module>()
6 for cl in cols:
7 dv = cl.findAll('div')
----> 8 if 'col-xs-12 col-sm-6' in dv[0]['class']:
9 flight, carrier, origin, date, scheduled, estimated, status = [c.text for c in dv]
10 print(flight, carrier, origin, date, scheduled, estimated, status)
IndexError: list index out of range
我已经将stackoverflow用于解决方案,但无法找到解决方案。这是我的代码:
# import libraries
import urllib3
import requests
from bs4 import BeautifulSoup
# query the website and return the html to the variable ‘page’
page = requests.get("https://www.aucklandairport.co.nz/flights").text
soup = BeautifulSoup(page)
tbody = soup.findAll('tbody')
for tb in tbody:
rows = tb.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
for cl in cols:
dv = cl.findAll('div')
if 'col-xs-12 col-sm-6' in dv[0]['class']:
flight, carrier, origin, date, scheduled, estimated, status = [c.text for c in dv]
print(flight, carrier, origin, date, scheduled, estimated, status)
感谢您的贡献。
答案 0 :(得分:2)
问题是td
下的第一个tr
没有div
,这就是dv
将返回空的原因。将您的代码更改为:
# import libraries
import requests
from bs4 import BeautifulSoup
# query the website and return the html to the variable ‘page’
page = requests.get("https://www.aucklandairport.co.nz/flights").text
soup = BeautifulSoup(page)
tbody = soup.find('tbody')
rows = tbody.findAll('tr',{'class':'flight-toggle'}) #find tr whose class = flight-toggle
for tr in rows:
cols = tr.findAll('td',class_=lambda x: x != 'logo') # find td whose class!=logo (exclude the first td)
dv0 = cols[0].find('div').findAll('div') #flight, carrier, origin under second td
flight, carrier, origin = [c.text.strip() for c in dv0]
dv1 = cols[1].find('div').findAll('div') #date, schedule under third td
date, scheduled = [c.text.strip() for c in dv1]
dv2 = cols[2].find('div').findAll('div') #estimated, statusunder fouth td
estimated, status = [c.text.strip() for c in dv2[1:]] # exclude the first div
print(flight, carrier, origin, date, scheduled, estimated, status)
这将打印出来:
(u'EK406', u'', u'Dubai / Melbourne', u'18 Nov', u'01:55pm', u'02:47pm', u'Processing')
(u'QF8762', u'EK406', u'Dubai / Melbourne', u'18 Nov', u'01:55pm', u'02:47pm', u'Processing')
(u'EK434', u'', u'Dubai / Brisbane', u'18 Nov', u'02:45pm', u'02:49pm', u'Processing')
...
答案 1 :(得分:1)
你可以用不同的方式剥掉同一个苹果。这是另一种可以达到同样目的的方法。
VerticalAlignment="Stretch"
部分结果:
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.aucklandairport.co.nz/flights")
soup = BeautifulSoup(response.text,"lxml")
table = soup.select(".flights-table")[0]
for items in table.select("tr.flight-toggle"):
data = ' '.join([' '.join(item.text.split()) for item in items.select("td")])
print(data.strip())