我遇到PhantomJS的问题,它可以在循环中挂起而不报告任何错误。我知道我的代码很好,因为重新启动后它通常会完成,并且可能会在以后挂起。我的想法可能是这样的:
i = 0
while i < len(url_list):
try:
driver.get(url_list[i])
# do whatever needs to be done
i = i+1
# go on the next one
except ThisIterationTakesTooLong:
# try again for this one because the code is definitely good
continue
甚至可以做这样的事情吗?基本上,在后台检查循环运行的时间。我知道time.time(),但问题是它甚至不会测量它是否挂在计数器前的命令上。
的修改
在查看建议的问题之后,我仍然遇到问题,因为信号模块不能正常工作。
import signal
signal.alarm(5)
抛出“AttributeError:'module'对象没有属性'alarm'”
所以看起来我真的不能用它。
答案 0 :(得分:1)
我之前遇到过这种事情,不幸的是,它并没有很好的解决方法。事实上,有时页面/元素只是不会加载,你必须做出选择。我通常最终做这样的事情:
from selenium.common.exceptions import TimeoutException
# How long to wait for page before timeout
driver.set_page_load_timeout(10)
def wait_for_url(driver, url, max_attempts):
"""Make multiple attempts to load page
according to page load timeout, and
max_attempts."""
attempts = 0
while attempts < max_attempts:
try:
driver.get(url)
return True
except TimeoutException:
# Prepare for another attempt
attempts += 1
if attempts == 10:
# Bail on max_attempts
return False
# We'll use this if we find any urls that won't load
# so we can process later.
revisit = []
for url in url_list:
# Make 10 attempts before giving up.
url_is_loaded = wait_for_url(driver, url, 10)
if url_is_loaded:
# Do whatever
else:
revisit.append(url)
# Now we can try to process those unvisitied URLs.
我还想补充一点,问题可能在于PhantomJS。最新版本的selenium弃用它。根据我的经验,PhantomJS很迟钝,容易出现意外行为。如果你需要无头,你可以使用非常稳定的Chrome。如果你不熟悉,那就是:
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(path/to/chromedriver, chrome_options=chrome_options)
也许其中一条建议会有所帮助。