Question

我有一个Python脚本，该脚本列出了一长串网站的标题。这需要很长时间，因此脚本必须运行数小时。但是，偶尔会出现错误“无法解码木偶的响应”。

鉴于我所读的内容，似乎该错误背后的原因似乎并非众所周知。摆脱它不是我的优先事项，而是在出现脚本时不要完全停止脚本，这是在给出错误时当前发生的事情。

我该怎么做？

这是代码：

from pyvirtualdisplay import Display
from time import sleep
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
display = Display(visible=0, size(800,600))
display.start()
urlsFile = open ("urls.txt", "r")
urls = urlsFile.readLines()
driver = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
driver.set_page_load_timeout(60)
for url in urls:
        try:
           driver.get(url)
           sleep(0.8)
           print(driver.title)
        except TimeoutException as e:
           print("Timeout")

Answer 1

注意：这是我第一次编写Python

您只需要以某种方式构建即可在GET操作失败时重试。您仍然会希望放弃一定的重试次数，但是至少这应该可以弥补每个URL的一次失败。

def retryable_get(self, url, max_tries = 5)
  attempts = 0
  while attempts < max_tries
    try:
      self.get(url)
    except Exception:
      puts 'An error occured performing a GET to ' + url
    finally:
      attempts += 1
  raise TimeoutException(f'Failed to GET {url} after {max_tries} attempts')

您可以使用以下方式调用它：

retryable_get(driver, url)

或者，如果您想要一种更像面向对象的方法，则可以使用鸭式Firefox类：

webdriver.Firefox.retryable_get = retryable_get

for url in urls:
  try:
    driver.retryable_get(url)
    sleep(0.8)
    print(driver.title)
  except TimeoutException as e:
    print("Timeout")

错误“无法解码木偶的响应失败”后，继续硒脚本

1 个答案: