我正在将Firefox GeckoDriver与Selenium结合使用,以在网站上为客户端下载一些文件。
该设置可在带有Docker的Digital Ocean上运行。
这是流程,因此,每当用户调用API时,都会创建一个新的浏览器实例,并使用用户的登录ID和密码登录到网站,然后下载一堆文件,创建一个zip并发回。
当只有一个请求意味着服务器上只有一个浏览器实例时,一切似乎都可以正常工作,但是当有多个请求意味着服务器上有多个浏览器实例时,所有请求都会中断,并显示错误消息“浏览上下文已被丢弃”。
这发生在爬网部分或实例创建之后。
此错误没有特定的模式,它随机发生并破坏浏览器实例。我已经遍历了有关该主题的所有问题和GitHub问题,但是其中有些是过时的解决方法,从一开始就无法在当前版本中使用,而有些则根本无法使用。
这是我在Jenkins上运行的浏览器版本和配置。
`
{'browserName': 'firefox',
'marionette': True,
'acceptInsecureCerts': True,
'moz:firefoxOptions': {
'prefs': {
'browser.download.folderList': 2,
'browser.download.dir': '/home/usr/usr/project/static/785fg7',
'browser.download.useDownloadDir': True,
'pdfjs.disabled': True,
'browser.helperApps.neverAsk.saveToDisk':
'application/vnd.openxmlformats-
officedocument.spreadsheetml.sheet,
application/pdf,
application/csv,application/excel,
application/vnd.msexcel,
application/vnd.ms-excel,text/anytext,
text/comma-separated-values,
text/csv,application/vnd.ms-excel,
application/octet-stream,
image/tiff'},
'args': ['-headless',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--window-size=1920,1080',
'--start-maximized']}} `
请注意,无论我创建了多少个浏览器实例,都可以在本地正常运行。问题仅在于将其部署在服务器上时。在我的本地系统中,无头模式可以处理任何数量的请求,一切正常。
这是启动浏览器的Python代码。
def get_firefox_driver_for_linux_server(apply_proxy, uuid_user, download_options=False):
firefox_options = Options()
firefox_options.set_headless()
if download_options:
if not os.path.exists(constants.DOWNLOADS_PATH):
os.mkdir(constants.DOWNLOADS_PATH)
download_path = os.path.join(constants.DOWNLOADS_PATH, uuid_user)
firefox_options.set_preference("browser.download.folderList", 2)
firefox_options.set_preference("browser.download.dir", download_path)
firefox_options.set_preference("browser.download.useDownloadDir", True)
firefox_options.set_preference("pdfjs.disabled", True)
firefox_options.set_preference("browser.helperApps.neverAsk.saveToDisk",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,"
"application/pdf,"
"application/csv,"
"application/excel,"
"application/vnd.msexcel,"
"application/vnd.ms-excel,"
"text/anytext,"
"text/comma-separated-values,"
"text/csv,"
"application/vnd.ms-excel,"
"application/octet-stream,"
"image/tiff")
firefox_options.add_argument("--no-sandbox")
firefox_options.add_argument("--disable-setuid-sandbox")
firefox_options.add_argument('--disable-dev-shm-usage')
firefox_options.add_argument("--window-size=1920,1080")
firefox_options.add_argument("--start-maximized")
if not os.path.exists(constants.LOG_PATH):
os.mkdir(constants.LOG_PATH)
import random as r
global random_id
random_id = str(r.randint(1, 99999))
logging.warning("random id...{}".format(random_id))
with open(os.path.join(constants.LOG_PATH, random_id + '.log'), 'w+') as lf:
pass
gecko_driver_path = "/usr/local/bin/geckodriver"
if apply_proxy:
proxy = "proxy:24000"
firefox_capabilities = webdriver.DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
firefox_capabilities['proxy'] = {
"proxyType": "MANUAL",
"httpProxy": proxy,
"ftpProxy": proxy,
"sslProxy": proxy
}
driver = webdriver.Firefox(executable_path=gecko_driver_path, firefox_options=firefox_options,
capabilities=firefox_capabilities,
log_path=os.path.join(constants.LOG_PATH, random_id + '.log'))
check_gecko_version(driver, firefox_options)
return driver
else:
logging.info("No proxy applied")
driver = webdriver.Firefox(executable_path=gecko_driver_path, firefox_options=firefox_options)
check_gecko_version(driver, firefox_options)
return driver