我想从this site抓取时间表。
我特别希望
中包含的文本div #tabs-4 > h3 > a > span
我已经尝试过了,但是它只返回第一个项目,而不返回该项目下的完整树。这个网站足够疯狂地使用#tabs-4
四次。
departures_table = soup.select('#tabs-4')
for div in alilauro_departures_table:
span = div.select('span')
alilauro_timetable.append({
"COMPANY": span[2].text,
"DEPARTURE DATE TIME" : span[0].text,
"ARRIVAL DATE TIME": span[4].text,
"ITINERARIO": span[1].text,
"FERRY NAME": span[3].text
})
答案 0 :(得分:1)
请尝试以下代码。由于您已经在使用#tab
链接,因此无需选择url
。
import bs4
import re
import requests
html_doc=requests.get("https://alilauronew.forth-crs.gr/italian_b2c/npgres.exe?func=TT&tripcount=1&StartDateLeg1=22%2F02%2F2019&StartDateLeg2=22%2F02%2F2019&StartDateLeg3=22%2F02%2F2019&StartDateLeg4=22%2F02%2F2019&Leg1ilabel=NAPOLI%28BEVERELLO%29&Leg1i=BEV&Leg1iilabel=ISCHIA&Leg1ii=ISH&Leg1Date=22%2F02%2F2019&Leg2ilabel=ISCHIA&Leg2i=ISH&Leg2iilabel=NAPOLI%28BEVERELLO%29&Leg2ii=BEV&Leg2Date=22%2F02%2F2019&Leg3ilabel=NAPOLI%28BEVERELLO%29&Leg3i=BEV&Leg3iilabel=FORIO&Leg3ii=FRD&Leg3Date=22%2F02%2F2019&Leg4ilabel=FORIO&Leg4i=FRD&Leg4iilabel=NAPOLI%28BEVERELLO%29&Leg4ii=BEV&Leg4Date=22%2F02%2F2019&TotalPassengers=1&TotalVehicles=0")
soup = bs4.BeautifulSoup(html_doc.text, 'html.parser')
headers=soup.find_all('h3' , id=re.compile("Leg1"))
for h in headers:
spans=h.find_all('span')
for span in spans:
print(span.text)
答案 1 :(得分:1)
主要问题是html部分仅在表中。其他项目在javascript中。因此,您需要像Kajal答案一样使用request
或使用Selenium
。
硒代码:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
driver=webdriver.Chrome(chrome_options=options, executable_path=r'your path')
driver.get('https://alilauronew.forth-crs.gr/italian_b2c/npgres.exe?func=TT&tripcount=1&StartDateLeg1=22%2F02%2F2019&StartDateLeg2=22%2F02%2F2019&StartDateLeg3=22%2F02%2F2019&StartDateLeg4=22%2F02%2F2019&Leg1ilabel=NAPOLI%28BEVERELLO%29&Leg1i=BEV&Leg1iilabel=ISCHIA&Leg1ii=ISH&Leg1Date=22%2F02%2F2019&Leg2ilabel=ISCHIA&Leg2i=ISH&Leg2iilabel=NAPOLI%28BEVERELLO%29&Leg2ii=BEV&Leg2Date=22%2F02%2F2019&Leg3ilabel=NAPOLI%28BEVERELLO%29&Leg3i=BEV&Leg3iilabel=FORIO&Leg3ii=FRD&Leg3Date=22%2F02%2F2019&Leg4ilabel=FORIO&Leg4i=FRD&Leg4iilabel=NAPOLI%28BEVERELLO%29&Leg4ii=BEV&Leg4Date=22%2F02%2F2019&TotalPassengers=1&TotalVehicles=0'
)
x = driver.find_elements_by_css_selector("div#tabs-4")
alilauro_timetable = []
for div in x:
print div.text
driver.close()
输出:
| | Ven 22 Feb 2019, 07:05 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | AIRONE JET| Ven 22 Feb 2019, 08:05
| | Ven 22 Feb 2019, 07:35 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 08:35
| | Ven 22 Feb 2019, 09:40 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 10:40
| | Ven 22 Feb 2019, 10:50 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 11:50
| | Ven 22 Feb 2019, 12:55 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 13:55
| | Ven 22 Feb 2019, 14:35 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 15:35
| | Ven 22 Feb 2019, 15:35 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 16:35
| | Ven 22 Feb 2019, 17:55 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 18:55
| | Ven 22 Feb 2019, 20:20 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 21:20
| | Ven 22 Feb 2019, 06:30 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 07:30
| | Ven 22 Feb 2019, 07:10 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 08:10
| | Ven 22 Feb 2019, 08:40 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 09:40
| | Ven 22 Feb 2019, 09:35 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 10:35
| | Ven 22 Feb 2019, 11:45 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 12:45
| | Ven 22 Feb 2019, 13:20 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 14:20
| | Ven 22 Feb 2019, 14:05 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 15:05
| | Ven 22 Feb 2019, 16:15 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 17:15
| | Ven 22 Feb 2019, 16:50 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 17:50
| | Ven 22 Feb 2019, 19:10 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 20:10
| | Ven 22 Feb 2019, 07:05 | NAPOLI(BEVERELLO) - FORIO | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 08:30
| | Ven 22 Feb 2019, 09:40 | NAPOLI(BEVERELLO) - FORIO | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 11:05
| | Ven 22 Feb 2019, 10:50 | NAPOLI(BEVERELLO) - FORIO | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 12:15
| | Ven 22 Feb 2019, 14:35 | NAPOLI(BEVERELLO) - FORIO | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 16:00
| | Ven 22 Feb 2019, 17:20 | NAPOLI(BEVERELLO) - FORIO | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 18:45
| | Ven 22 Feb 2019, 06:45 | FORIO - NAPOLI(BEVERELLO) | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 08:10
| | Ven 22 Feb 2019, 09:15 | FORIO - NAPOLI(BEVERELLO) | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 10:35
| | Ven 22 Feb 2019, 11:20 | FORIO - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 12:45
| | Ven 22 Feb 2019, 13:00 | FORIO - NAPOLI(BEVERELLO) | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 14:20
| | Ven 22 Feb 2019, 15:55 | FORIO - NAPOLI(BEVERELLO) | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 17:15