Question

我遇到PhantomJS的问题，它可以在循环中挂起而不报告任何错误。我知道我的代码很好，因为重新启动后它通常会完成，并且可能会在以后挂起。我的想法可能是这样的：

i = 0
while i < len(url_list):
    try:
        driver.get(url_list[i])
        # do whatever needs to be done
        i = i+1
        # go on the next one
    except ThisIterationTakesTooLong:
        # try again for this one because the code is definitely good
        continue

甚至可以做这样的事情吗？基本上，在后台检查循环运行的时间。我知道time.time（），但问题是它甚至不会测量它是否挂在计数器前的命令上。

的修改
在查看建议的问题之后，我仍然遇到问题，因为信号模块不能正常工作。

import signal
signal.alarm(5)

抛出“AttributeError：'module'对象没有属性'alarm'”
所以看起来我真的不能用它。

Answer 1

我之前遇到过这种事情，不幸的是，它并没有很好的解决方法。事实上，有时页面/元素只是不会加载，你必须做出选择。我通常最终做这样的事情：

from selenium.common.exceptions import TimeoutException

# How long to wait for page before timeout
driver.set_page_load_timeout(10)

def wait_for_url(driver, url, max_attempts):
    """Make multiple attempts to load page
    according to page load timeout, and
    max_attempts."""

    attempts = 0

    while attempts < max_attempts:

        try:
            driver.get(url)
            return True

        except TimeoutException:
            # Prepare for another attempt
            attempts += 1

            if attempts == 10:
                # Bail on max_attempts
                return False

# We'll use this if we find any urls that won't load
# so we can process later. 
revisit = []

for url in url_list:

    # Make 10 attempts before giving up.
    url_is_loaded = wait_for_url(driver, url, 10)

    if url_is_loaded:
        # Do whatever

    else:
        revisit.append(url)

# Now we can try to process those unvisitied URLs.

我还想补充一点，问题可能在于PhantomJS。最新版本的selenium弃用它。根据我的经验，PhantomJS很迟钝，容易出现意外行为。如果你需要无头，你可以使用非常稳定的Chrome。如果你不熟悉，那就是：

from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(path/to/chromedriver, chrome_options=chrome_options)

也许其中一条建议会有所帮助。

python - 如果此迭代花费的时间太长，则继续循环

1 个答案: