我使用硒来抓取this网站。首先,我点击了吸引力类型旁边的清除按钮。然后我点击了类别列表底部的更多链接。现在每个人都按ID找到元素,然后点击链接。问题是当我点击第一类户外活动时,网站再次回到初始状态,当我尝试点击下一个链接时出现以下错误:
StaleElementReferenceException: Message: Element is no longer attached to the DOM
我的代码是:
class TripSpider(CrawlSpider):
name = "tspider"
allowed_domains = ["tripadvisor.ca"]
start_urls = ['http://www.tripadvisor.ca/Attractions-g147288-Activities-c42-Dominican_Republic.html']
def __init__(self):
self.driver = webdriver.Firefox()
self.driver.maximize_window()
def parse(self, response):
self.driver.get(response.url)
self.driver.find_element_by_class_name('filter_clear').click()
time.sleep(3)
self.driver.find_element_by_class_name('show').click()
time.sleep(3)
#to handle popups
self.driver.switch_to.window(browser.window_handles[-1])
# Close the new window
self.driver.close()
# Switch back to original browser (first window)
self.driver.switch_to.window(browser.window_handles[0])
divs = self.driver.find_elements_by_xpath('//div[contains(@id,"ATTR_CATEGORY")]')
for d in divs:
d.find_element_by_tag_name('a').click()
time.sleep(3)
答案 0 :(得分:1)
这个网站的问题尤其在于,每次点击一个元素时DOM都会发生变化,所以你不能遍历那些过时的元素。
我很久以前遇到同样的问题,我使用不同的窗口为每个链接解决了这个问题。
您可以更改此部分代码:
divs = self.driver.find_elements_by_xpath('//div[contains(@id,"ATTR_CATEGORY")]')
for d in divs:
d.find_element_by_tag_name('a').click()
time.sleep(3)
有关:
from selenium.webdriver.common.keys import Keys
mainWindow = self.driver.current_window_handle
divs = self.driver.find_elements_by_xpath('//div[contains(@id,"ATTR_CATEGORY")]')
for d in divs:
# Open the element in a new Window
d.find_element_by_tag_name('a').send_keys(Keys.SHIFT + Keys.ENTER)
self.driver.switch_to_window(self.driver.window_handles[1])
# Here you do whatever you want in the new window
# Close the window and continue
self.driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 'w')
self.driver.switch_to_window(mainWindow)