python - 如果此迭代花费的时间太长,则继续循环

时间:2018-06-12 12:05:36

标签: python exception phantomjs

我遇到PhantomJS的问题,它可以在循环中挂起而不报告任何错误。我知道我的代码很好,因为重新启动后它通常会完成,并且可能会在以后挂起。我的想法可能是这样的:

i = 0
while i < len(url_list):
    try:
        driver.get(url_list[i])
        # do whatever needs to be done
        i = i+1
        # go on the next one
    except ThisIterationTakesTooLong:
        # try again for this one because the code is definitely good
        continue

甚至可以做这样的事情吗?基本上,在后台检查循环运行的时间。我知道time.time(),但问题是它甚至不会测量它是否挂在计数器前的命令上。


修改
在查看建议的问题之后,我仍然遇到问题,因为信号模块不能正常工作。

import signal
signal.alarm(5)

抛出“AttributeError:'module'对象没有属性'alarm'”
所以看起来我真的不能用它。

1 个答案:

答案 0 :(得分:1)

我之前遇到过这种事情,不幸的是,它并没有很好的解决方法。事实上,有时页面/元素只是不会加载,你必须做出选择。我通常最终做这样的事情:

from selenium.common.exceptions import TimeoutException

# How long to wait for page before timeout
driver.set_page_load_timeout(10)

def wait_for_url(driver, url, max_attempts):
    """Make multiple attempts to load page
    according to page load timeout, and
    max_attempts."""

    attempts = 0

    while attempts < max_attempts:

        try:
            driver.get(url)
            return True

        except TimeoutException:
            # Prepare for another attempt
            attempts += 1

            if attempts == 10:
                # Bail on max_attempts
                return False

# We'll use this if we find any urls that won't load
# so we can process later. 
revisit = []

for url in url_list:

    # Make 10 attempts before giving up.
    url_is_loaded = wait_for_url(driver, url, 10)

    if url_is_loaded:
        # Do whatever

    else:
        revisit.append(url)

# Now we can try to process those unvisitied URLs. 

我还想补充一点,问题可能在于PhantomJS。最新版本的selenium弃用它。根据我的经验,PhantomJS很迟钝,容易出现意外行为。如果你需要无头,你可以使用非常稳定的Chrome。如果你不熟悉,那就是:

from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(path/to/chromedriver, chrome_options=chrome_options)

也许其中一条建议会有所帮助。