我正在尝试获取所有酒店,但即使我已经执行了向下滚动的脚本,我的page_source也只显示包含11家酒店的html代码,即最初加载的内容。
如何在向下滚动以获取所有酒店后获取整个数据源代码?
如果driver.execute脚本正在加载整个页面,那么如何将整个页面的页面源存储在我的变量中?
PS:这只是出于教育目的
from selenium import webdriver
import re
import pandas as pd
import time
chrome_path = r"C:\Users\ajite\Desktop\web scraping\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://www.makemytrip.com/mmthtl/site/hotels/search?checkin=02252018&checkout=02262018&roomStayQualifier=1e0e&city=GOI&searchText=Goa,%20India&country=IN')
driver.implicitly_wait(3)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
two_hotels = driver.find_elements_by_xpath('//*[@id="hotel_card_list"]/div')
答案 0 :(得分:1)
您的滚动未执行,而不是:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
你应该尝试:
for i in range(0,25): # here you will need to tune to see exactly how many scrolls you need
driver.execute_script('window.scrollBy(0, 400)')
time.sleep(1)
我试过的代码:
import selenium
import time
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.makemytrip.com/mmthtl/site/hotels/search?checkin=02252018&checkout=02262018&roomStayQualifier=1e0e&city=GOI&searchText=Goa,%20India&country=IN")
driver.implicitly_wait(3)
for i in range(0,25): # here you will need to tune to see exactly how many scrolls you need
driver.execute_script('window.scrollBy(0, 400)')
time.sleep(1)
time.sleep(10) #more time so the cards will load
two_hotels = driver.find_elements_by_xpath('//*[@id="hotel_card_list"]/div')
two_hotels
现在有更多值
https://docs.aws.amazon.com/lambda/latest/dg/limits.html
对于i
范围内的25
,我获得了酒店的42
值,我认为您需要调整一下这些值以获得所需的全部内容。