Question

我使用selenium在python中编写了一个脚本，以便从网页中获取一些特定信息。由于网页是机密网页，我无法透露网站地址。无论如何，我期待我现有的刮刀将从网页中的20个链接中点击每个链接并到达所需的页面，它将收集信息并返回到之前的页面并重复相同，直到所有20个链接都用完为止。但是，刮刀点击一个链接，转到所需的页面解析信息，但不是回到主页重复操作，它会中断。我的循环过程似乎有问题。以下是我的脚本中的一些行，可能会让您知道为我提供解决方法。

for link in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".section-result"))):  ##Supposed to loop through all the links
    link.click()   ##clicking each link

    name = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".section-info-text")))[2] ##this is where the document i want to parse from. The browser gets here when a click is executed 
    print(name.text) #after parsing the docs the code breaks instead of getting back to main page

请注意向右滚动以查看每行附加的最低描述。感谢。

这是我遇到的错误：

line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

Answer 1

问题基本上是这样的：

您将所有链接作为WebElements迭代
你开始循环
您点击第一个链接，它会将您带到一个新页面，导致该WebElements列表过时
您尝试继续使用过时的WebElements，即使它们不再与任何内容相关联。

你能做什么呢？

伪代码

linkCount = getCountOfLinks();

for x in range(0, linkCount-1):
    #Get all the links again fresh, and pick the next one each iteration
    link = getAllTheLinks[x]; 

    link.click();

    #the rest of your stuff
    name = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".section-info-text")))[2]  
print(name.text)

Answer 2

如果代码没有返回主页面，您可能需要执行一个返回上一个（主）页面的命令，例如某种后退按钮。我不是硒专家，但我使用了Protractor（jlen包装器用于selenium），并且看到了类似的问题。

如何防止我的脚本在第一次循环后破坏？

2 个答案: