使用Selenium和python抓取多页表

时间:2018-12-17 09:50:24

标签: python windows web-scraping

我必须抓取一个在线航空公司预订网站。

driver = webdriver.Chrome('C:\\Users\\HP\\Downloads\\Compressed\\chromedriver_win32_2\\chromedriver.exe')
driver.set_page_load_timeout(30)

driver.get("https://matrix.itasoftware.com/")
driver.maximize_window()
driver.find_element_by_id("cityPair-orig-0").send_keys("BOM")
driver.find_element_by_class_name("relative").click()
driver.wait = WebDriverWait(driver, 20)

driver.find_element_by_id("cityPair-dest-0").send_keys("AMS")
driver.find_element_by_class_name("relative").click()
driver.wait = WebDriverWait(driver, 20)

driver.find_element_by_id("cityPair-outDate-0").send_keys("01/01/2019")
driver.find_element_by_class_name("relative").click()
driver.wait = WebDriverWait(driver, 20)

driver.find_element_by_id("cityPair-outDate-0").send_keys("02/01/2019")
driver.find_element_by_class_name("relative").click()
driver.wait = WebDriverWait(driver, 20)
driver.wait = WebDriverWait(driver, 30)
search = driver.find_element_by_id("searchButton-0")

driver.wait = WebDriverWait(driver, 30)
search.click()

运行代码时,我得到的网站与实际的网站不相似。实际的网站包含一个多页表(17页),当我运行代码时,该表减少到只有一页。因此,我无法在抓取后获得所有航空公司的清单。

This is the actual website showing 17 pages at the bottom right

This I get after running the code. And it sums up all the 17 pages into 1 by skipping airlines in between. There are many airlines missing on the webpage that I get after scraping

0 个答案:

没有答案