我尝试创建对整个网站进行爬网的功能。 今天有一个TimeoutException ...
Traceback (most recent call last):
File "D:/Entwicklung/example/crawler/crawler.py", line 46, in crawl
driver.get(tmp)
File "C:\Users\test\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "C:\Users\test\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\test\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout
(Session info: chrome=75.0.3770.142)
当我将视频的网址传递给driver.get()函数时,TimeoutException被触发。 我的循环一直运行,但是在TimeoutException之后也会触发TimeoutException的每个driver.get()调用。为什么会这样?
while len(diff) > 0:
tmp = diff.pop()
visited.add(tmp)
driver.get(tmp)
elements = driver.find_elements_by_tag_name("a")
for element in elements:
href = element.get_attribute('href')
if href is None:
continue
else:
if main_url in href:
links.add(href)
diff = links.difference(visited)
答案 0 :(得分:0)
您的代码已达到默认超时时间,这就是您看到该消息的原因。
您是否尝试过使用Waits?在进入下一个代码块之前,它们给Selenium多一点时间。
找到了另一个博客:Dealing with Selenium timeouts。他们修改了Selenium的默认超时设置。