Question

我尝试创建对整个网站进行爬网的功能。今天有一个TimeoutException ...

Traceback (most recent call last):
  File "D:/Entwicklung/example/crawler/crawler.py", line 46, in crawl
    driver.get(tmp)
  File "C:\Users\test\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "C:\Users\test\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\test\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout
  (Session info: chrome=75.0.3770.142)

当我将视频的网址传递给driver.get（）函数时，TimeoutException被触发。我的循环一直运行，但是在TimeoutException之后也会触发TimeoutException的每个driver.get（）调用。为什么会这样？

while len(diff) > 0:
    tmp = diff.pop()
    visited.add(tmp)
    driver.get(tmp)

    elements = driver.find_elements_by_tag_name("a")

    for element in elements:
        href = element.get_attribute('href')
        if href is None:
            continue
        else:
            if main_url in href:
                links.add(href)

    diff = links.difference(visited)

Answer 1

您的代码已达到默认超时时间，这就是您看到该消息的原因。

您是否尝试过使用Waits？在进入下一个代码块之前，它们给Selenium多一点时间。

找到了另一个博客：Dealing with Selenium timeouts。他们修改了Selenium的默认超时设置。

如果我加载视频，python selenium给了我TimeoutException

1 个答案: