我已经在python中与硒结合创建了一个脚本,以从其着陆页中获得number
的答案,并从其内页中获得询问者的name
。我知道使用问题链接和下一页链接来抓取这两项比较容易,但这不是我打算在这里做的。底线是我试图仅使用点击来遍历不同的地方。但是,当我运行脚本时,在第二次迭代中,它指向此行answer = WebDriverWait(item,10)
抛出了以下错误。
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
尽管我要查找的元素在登录页面和内页中都可用,但这是我必须从两个不同的深度抓取两个项目的要求。 < / p>
我知道如何使用请求来抓取它们,所以我也不愿意走那条路。
我正在尝试的脚本:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = 'https://stackoverflow.com/questions/tagged/web-scraping'
def get_content(link):
driver.get(link)
while True:
for count,item in enumerate(WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".question-summary")))):
#error thrown in the following line in it's second iteration
answer = WebDriverWait(item,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"[class$='answered'] > strong"))).text
elem = driver.find_elements_by_css_selector(".summary a.question-hyperlink")[count]
driver.execute_script("arguments[0].click();",elem)
name = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"h1[itemprop='name'] > a"))).text
print(answer,name)
driver.back()
try:
next_page = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"a[rel='next']")))
driver.execute_script("arguments[0].click();",next_page)
except Exception:
break
if __name__ == '__main__':
with webdriver.Chrome() as driver:
get_content(link)
如何从两个不同的深度刮擦这两个物品?
PS如果我踢出这行answer = WebDriverWait(item,10)----
,该脚本将像超级按钮一样运行,遍历不同深度和多个页面。
答案 0 :(得分:2)
获得StaleElementReferenceException
是正常的,因为您离开页面并且对.question-summary
元素的引用丢失了。
错误说明:Thrown when a reference to an element is now "stale".
要执行此操作,下面的代码将起作用。我将[class$='answered'] > strong
选择器更改为[class*='answered'] > strong
,否则如果问题已接受答案,则会出现错误。如果只希望不被接受的脚本,请根据需要修改脚本。
def get_content(link):
driver.get(link)
while True:
count = len(WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".question-summary"))))
for ix in range(count):
question = driver.find_elements_by_css_selector(".question-summary")[ix]
answers_count = question.find_element_by_css_selector("[class*='answered'] > strong").text
driver.execute_script("arguments[0].click();", question.find_element_by_css_selector("a.question-hyperlink"))
name = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "h1[itemprop='name'] > a"))).text
print(answers_count, name)
driver.back()
try:
next_page = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "a[rel='next']")))
driver.execute_script("arguments[0].click();", next_page)
except Exception:
break