Question

我正在用Python编写程序来收集网站链接。代码是：

links = driver.find_elements_by_xpath('//*[@href]')
for link in links:
     print(link.get_attribute('href'))
time.sleep(1)

我在某些站点尝试过，效果很好。问题是当我在特定站点（www.ifood.com.br）中使用时。它收集一些链接，然后返回一些错误。我是Python的初学者，所以我不知道它们的含义。拜托，我需要一些帮助。

代码结果：

https://d1jgln4w9al398.cloudfront.net/imagens/ce/wl/www.ifood.com.br/favicon.ico https://d1jgln4w9al398.cloudfront.net/site/2.1.238-20181023.22/css/main.css https://fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800 https://www.ifood.com.br/

跟踪（最近一次通话最近）：文件“ C：\ Users \ jorda \ Desktop \ Python-Projetos \ digitar ifood.py”，行32，在print（link.get_attribute（'href'））中，文件“ C ：\ Users \ jorda \ AppData \ Local \ Programs \ Python \ Python37-32 \ lib \ site-packages \ selenium \ webdriver \ remote \ webelement.py“，第143行，位于get_attribute resp = self._execute（Command.GET_ELEMENT_ATTRIBUTE， {'name'：name}）文件“ C：\ Users \ jorda \ AppData \ Local \ Programs \ Python \ Python37-32 \ lib \ site-packages \ selenium \ webdriver \ remote \ webelement.py”，第633行，位于_execute return self._parent.execute（命令，参数）文件“ C：\ Users \ jorda \ AppData \ Local \ Programs \ Python \ Python37-32 \ lib \ site-packages \ selenium \ webdriver \ remote \ webdriver.py”，执行self.error_handler.check_response（response）文件中的第321行“ C：\ Users \ jorda \ AppData \ Local \ Programs \ Python \ Python37-32 \ lib \ site-packages \ selenium \ webdriver \ remote \ errorhandler.py” 242行，在check_response中，引发exception_class（message，screen，stacktrace）selenium.common.exceptions.StaleElementReferenceExc eption：消息：过时的元素引用：元素未附加到页面文档（会话信息：chrome = 70.0.3538.77）（驱动程序信息：chromedriver = 2.42.591088（7b2b2dca23cca0862f674758c9a3933e685c27d5），platform = Windows NT 10.0.17134 x86_64） >

Answer 1

在错误日志中，您可以看到

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

通常，当您尝试与DOM中不再存在的Web元素进行交互时，就会发生这种情况。一个典型的场景可以描述为

您打开了一个网页。
找到一些元素并将其保存到变量中。
页面DOM已更改（例如重新加载）。
您仍然看到同一页面，但从硒的角度来看，第2步中的元素为STALE。

因此，根据您的情况，您可以尝试在调用.findElements之前确保页面已完全加载（即不还原DOM）。检查这是否可以解决问题的最简单方法是在调用.findElements之前添加睡眠。

time.sleep(5)
links = driver.find_elements_by_xpath('//*[@href]')
for link in links:
     print(link.get_attribute('href'))

请注意，不建议您使用睡眠。因为例如，如果5秒钟有效，那么目前无法保证在某个时候（由于连接不良）它不会破坏您的测试。而是使用智能等待条件，该条件将重复检查“页面已加载”条件，并仅在发生这种情况时才继续。可在此处找到更多详细信息：Python Selenium stale element fix

无法从网站（Python）收集链接

1 个答案: