我已经在python中用selenium编写了一个脚本来处理无限滚动的网页。我面临的问题是它滚动几次然后退出浏览器。它永远不会到达底部。我也尝试使用Explicit Wait
,但滚动次数更少。如果没有更多的滚动操作,我怎样才能到达底部。
这是我的尝试:
import time
from selenium import webdriver
from urllib.parse import urljoin
url = "https://www.instagram.com/explore/tags/travelphotoawards/"
driver = webdriver.Chrome()
driver.get(url)
last_len = len(driver.find_elements_by_css_selector(".v1Nh3 a"))
new_len = last_len
while True:
last_len = new_len
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
items = driver.find_elements_by_css_selector(".v1Nh3 a")
new_len = len(items)
if last_len == new_len:break
driver.quit()
编辑:
如果我尝试如下,我可以根据需要多次滚动,但这不是一个好主意:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
url = "https://www.instagram.com/explore/tags/travelphotoawards/"
driver = webdriver.Chrome()
driver.get(url)
for scroll in range(1,10): #I can do the scrolling as many times as I want but it is fully hardcoded
item = driver.find_element_by_tag_name("body")
item.send_keys(Keys.END)
elems = driver.find_elements_by_css_selector(".v1Nh3 a")
time.sleep(3)
driver.quit()
我希望有任何方法可以自动滚动直到它到达底部。
答案 0 :(得分:3)
这里很少。在无限滚动的情况下,我会注意以下事项
下面是一个更新的脚本,它将对您更好。不要记住没有什么是完美的,所以您需要使脚本适应失败
import time
from selenium import webdriver
from urllib.parse import urljoin
option = webdriver.ChromeOptions()
chrome_prefs = {}
option.experimental_options["prefs"] = chrome_prefs
chrome_prefs["profile.default_content_settings"] = {"images": 2}
chrome_prefs["profile.managed_default_content_settings"] = {"images": 2}
driver = webdriver.Chrome(chrome_options=option)
url = "https://www.instagram.com/explore/tags/travelphotoawards/"
driver.get(url)
last_len = len(driver.find_elements_by_css_selector(".v1Nh3 a"))
new_len = last_len
consistent = 0
while True:
last_len = new_len
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
items = driver.find_elements_by_css_selector(".v1Nh3 a")
new_len = len(items)
if last_len == new_len:
consistent += 1
if consistent == 3:
break
else:
consistent = 0
driver.quit()
答案 1 :(得分:2)
每次有滚动旧图像消失。滚动后,您可能会获得相同数量或更少数量的图像。
每张图片都有唯一的href
,您可以将最后一张图片href
与上一张图片进行比较
last_href = driver.find_elements_by_css_selector('.v1Nh3 > a')[-1].get_attribute('href')
new_href = last_href
while True:
last_href = new_href
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
new_href = driver.find_elements_by_css_selector('.v1Nh3 > a')[-1].get_attribute('href')
if last_href != new_href:
break