Question

我想在动态href上循环。确实，我每页下载了一组文件。在每个页面上，我下载100个文本文件，但我必须下载200 000个文件。因此，我必须在2000年单击next按钮。为此，我获得了next按钮的href地址，但不幸的是，此链接中两个对象发生了更改，页码1,2,3等，以及一个字符串字符。请查看附有更改的下一个按钮的示例。

我是Python的新用户。我的水平很差。

#Before I add selenium setup for scraping. 

n=2000

for i in range(1,n):
    href="https://search.proquest.com/something/715376F5A5AF44BBPQ/" + str(i) + "?accountid=12543#scrollTo"
    driver.get(href)

#Here, I add the code which allows downloading for each page.

Answer 1

我无法使用示例链接（我无法注册）

第一..

什么是“字符串的汉字” ？

书号？或类别编号？

如果只是随机字符串，我认为您应该找到另一种方法。

使用ActionChain怎么样？或driver.execute_script()？

首先，我认为，（从.js或.html中）找到字符串的含义更为重要。

Answer 2

@나민오我需要帮助来识别下一页按钮的xpath。我的目标是遍历Python Selenium中的页面。检查这张图片的URL页面后，请在下一页按钮的代码下面找到。

next page button picture after inspect

我尝试用selenium在python中编写以下代码，以逐页下载文件。

while True:

scraping()          # here I call my function that allows to download the files per page

try:
    #Checks if there are more pages with links
    next_link = driver.find_element_by_xpath("//*[@title='Page suivante']")
    drive.execute_script("arguments[0].scrollIntoView();", next_link)
    next_link.click()
     #Time sleep
    time.sleep(20)  
except NoSuchElementException:
    pages_rows= False

如何在python中使用硒在动态href链接上循环？

2 个答案: