我正在使用Jupyter从gfycat平台抓取剪辑链接;但我一直在努力获取所有链接(页面需要不断向下滚动,以便显示其他剪辑)。
我尝试了以下代码: (它可以工作,但仅返回首页中的剪辑,向下滚动一个。)
from selenium import webdriver
import time
driver = webdriver.Chrome('C:\\chromedriver.exe')
driver.get("https://gfycat.com/sound-gifs/search/%23funny")
last_link = ""
L = []
while True:
elements = driver.find_elements_by_xpath("//div[@class='m-grid-container']//div[@class='grid-gfy-item']/a[@href]")
actual_last_link_element = elements[len(elements) -1]
for element in elements:
t = element.get_attribute("href")
print(t)
L.append(t)
driver.execute_script("arguments[0].scrollIntoView();", actual_last_link_element)
time.sleep(10)
actual_last_link = actual_last_link_element.get_attribute("href")
if last_link == actual_last_link:
print("done..")
break
else:
last_link = actual_last_link
我尝试了另一个代码,但效果也不佳:
from selenium import webdriver
import time
driver = webdriver.Chrome('C:\\chromedriver.exe')
driver.get("https://gfycat.com/sound-gifs/search/%23funny")
def getElements():
return driver.find_elements_by_xpath("//div[@class='m-grid-container']//div[@class='grid-gfy-item']/a[@href]")
def scrollIntoElementView(element):
driver.execute_script("arguments[0].scrollIntoView();", element)
time.sleep(3)
def fetchLinks():
global recursion_tracker
links = set('')
while True:
links_count = len(links)
elements = getElements()
for element in elements:
link = element.get_attribute("href")
links.add(link)
#print(len(links), links)
scrollIntoElementView(elements[len(elements)-1])
if links_count == len(links):
if recursion_tracker <= 10:
scrollIntoElementView(elements[len(elements)-1])
recursion_tracker+=1
fetchLinks()
else:
break
print("All links: " , links)
recursion_tracker = 0
fetchLinks()
我需要帮助来收集所有剪辑链接。