我正在尝试获取仅占用大约33%时间的页面的html。我的策略是不断刷新页面,直到它最终加载。
我从另一个调用此函数,其中我已经启动了我的驱动程序(编辑为包含while
语句的try / catch块,符合@jouokedleaf的建议:
def get_table(url, driver):
driver.get(url)
main_window = driver.current_window_handle
html_button = driver.find_element(By.XPATH, '//*[@title="View as HTML"]')
html_button.send_keys(Keys.CONTROL + Keys.RETURN)
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)
driver.switch_to.active_element
try:
while 'extranet.chem' not in driver.title:
sleep(2)
print('refreshing to get data')
try:
html_button.send_keys(Keys.CONTROL + Keys.RETURN)
except Exception:
print('deeper exception')
driver.refresh()
except:
print('while exception')
pass
我使用嵌套的except
来捕获driver.refresh()
调用的可能异常。出于某种原因,即使我调用pass
来忽略异常,循环也会在查找驱动程序标题时中断:
错误消息:
refreshing to get data
refreshing to get data
refreshing to get data
deeper exception
while exception
Traceback (most recent call last):
File "scraper.py", line 83, in <module>
get_latest()
File "scraper.py", line 28, in get_latest
url = row.find_element(By.XPATH, link_xpath).get_attribute('href')
File "C:\Users\Joseph\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webelement.py", line 645, in find_element
{"using": by, "value": value})['value']
File "C:\Users\Joseph\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webelement.py", line 628, in _execute
return self._parent.execute(command, params)
File "C:\Users\Joseph\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "C:\Users\Joseph\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 237, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of <tr class="ms-alternating"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
为什么这个例外不仅被忽略?
答案 0 :(得分:1)
查看提供的回溯,您可以看到行while 'extranet.chem' not in driver.title:
上引发异常:
File "scraper.py", line 55, in get_table
while 'extranet.chem' not in driver.title:
不在try/except
块中。我不确定在查看driver.title
时我是否看到过确切的异常,但我认为这是正常的。如果您对所使用的页面一无所知,我们将无法为您提供更多帮助。您的选择是捕获在该行生成的异常。如果存在警报框,在处理警报之前,您很可能无法远离或刷新该页面。你应该建立一种处理警报的方法。