我已经使用selenium在python中编写了一个脚本,以从其着陆页使用name
函数获取reputation
和get_names()
,然后单击不同文章的链接以到达内页,以便从此处使用title
函数解析get_additional_info()
。
我要解析的所有信息在目标页面以及内部页面中都可用。而且,它们不是动态的,因此硒绝对是过大的。 但是,我的目的是利用硒同时从两个不同的深度抓取信息。
在下面的脚本中,如果我注释掉name
和rep
行,则可以看到该脚本可以点击目标网页的链接,并从内部解析title
页面完美无缺。
但是,按原样运行脚本时,出现selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
错误,指向此name = item.find_element_by_css_selector()
行。
我如何摆脱这个错误,并使其完全符合我已经实现的逻辑?
到目前为止,我已经尝试过:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
lead_url = 'https://stackoverflow.com/questions/tagged/web-scraping'
def get_names():
driver.get(lead_url)
for count, item in enumerate(wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".summary")))):
usableList = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".summary .question-hyperlink")))
name = item.find_element_by_css_selector(".user-details > a").text
rep = item.find_element_by_css_selector("span.reputation-score").text
driver.execute_script("arguments[0].click();",usableList[count])
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h1 > a.question-hyperlink")))
title = get_additional_info()
print(name,rep,title)
driver.back()
wait.until(EC.staleness_of(usableList[count]))
def get_additional_info():
title = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h1 > a.question-hyperlink"))).text
return title
if __name__ == '__main__':
driver = webdriver.Chrome()
wait = WebDriverWait(driver,5)
get_names()
答案 0 :(得分:2)
广泛使用您的设计模式...别忘了item
。使用count
索引从当前page_source
中提取的元素列表,例如
driver.find_elements_by_css_selector(".user-details > a")[count].text
Py
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
lead_url = 'https://stackoverflow.com/questions/tagged/web-scraping'
def get_names():
driver.get(lead_url)
for count, item in enumerate(wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".summary")))):
usableList = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".summary .question-hyperlink")))
name = driver.find_elements_by_css_selector(".user-details > a")[count].text
rep = driver.find_elements_by_css_selector("span.reputation-score")[count].text
driver.execute_script("arguments[0].click();",usableList[count])
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h1 > a.question-hyperlink")))
title = get_additional_info()
print(name,rep,title)
driver.back()
wait.until(EC.staleness_of(usableList[count]))
def get_additional_info():
title = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h1 > a.question-hyperlink"))).text
return title
if __name__ == '__main__':
driver = webdriver.Chrome()
wait = WebDriverWait(driver,5)
get_names()