Question

我已经在python中创建了一个与selenium结合使用的scraper来从站点收集一些信息。但是，我面临的问题是，在收集单个潜在客户之后，刮刀会抛出错误element is not attached to the page document。

考虑以下代码：

for loop卷有20个名称，刮刀应点击每个名称。
点击第一个名称后，它会在新页面中等待文档可用。
在该页面的右上角有一个显示更多按钮，点击该按钮可打开隐藏的信息。（它仍然停留在第二页，只有新信息可见）。
一旦信息显示，刮刀就会成功收集。
然后它应该回到循环开始的起始页面并转到下一个要点击的名称。但是，它不会点击下一个名称，而是抛出以下错误（在link.click()行上）。

我尝试使用wait.until(EC.staleness_of(item))摆脱陈旧的元素错误，但它无法正常工作。

for link in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"div.presence-entity__image"))):
    link.click() #error thrown here
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"button[data-control-name='contact_see_more']"))).click()
    item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".pv-contact-info__ci-container a[href^='mailto:']")))
    print(item.get_attribute("href"))
    driver.execute_script("window.history.go(-1)")
    wait.until(EC.staleness_of(item))

错误我有：

line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

我试图描绘正在发生的事情。对此的任何帮助将受到高度赞赏。

Answer 1

Instead of clicking on each link in a loop, you'd better to collect all links and navigate to all those links in a loop:

links = [link.get_attribute('href') for link in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"a.mn-person-info__picture.ember-view")))]
for link in links:
    driver.get(link)
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"button[data-control-name='contact_see_more']"))).click()
    item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".pv-contact-info__ci-container a[href^='mailto:']")))
    print(item.get_attribute("href"))

Note that to get all links you might need to scroll Connections page down to load more connections via XHR

我的刮刀抛出错误而不是继续

1 个答案: