在python中的多线程处理中打开多个webdrivers时出错

时间:2014-12-28 15:40:49

标签: python multithreading selenium

我正在使用python进行网页抓取。我正在尝试使用多线程来加速抓取。而且,我将使用硒。所以,在每个线程中,我打开一个webdriver。当我打开4个线程时,程序运行良好。但是,当我尝试打开5个线程或超过5个线程时,程序将返回错误,如下所示:

WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\apogne\\appdata\\local\\temp\\tmpqzhfxq.webdriver.xpi\\platform\\WINNT_x86-msvc\\components\\webdriver-firefox-latest.dll'

程序可以简化如下,但仍然会出现同样的错误。

from selenium import webdriver
from threading import Thread

def f():    
    driver=webdriver.Firefox()    
    driver.close()

thread_list=[]

for i in range(5):
    t=Thread(target=f)    
    t.start()    
    thread_list.append(t)

for t in thread_list:    
    t.join()

错误的完整痕迹如下。

Exception in thread Thread-2:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
    self.run()
  File "C:\Python27\lib\threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "C:\Python27\pinterest\user_info_multiThread.py", line 21, in gettingUserInfo
    driver = webdriver.Firefox()
  File "C:\Python27\lib\selenium\webdriver\firefox\webdriver.py", line 59, in __init__
    self.binary, timeout),
  File "C:\Python27\lib\selenium\webdriver\firefox\extension_connection.py", line 45, in __init__
    self.profile.add_extension()
  File "C:\Python27\lib\selenium\webdriver\firefox\firefox_profile.py", line 92, in add_extension
    self._install_extension(extension)
  File "C:\Python27\lib\selenium\webdriver\firefox\firefox_profile.py", line 285, in _install_extension
    shutil.rmtree(tmpdir)
  File "C:\Python27\lib\shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "C:\Python27\lib\shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "C:\Python27\lib\shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "C:\Python27\lib\shutil.py", line 252, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "C:\Python27\lib\shutil.py", line 250, in rmtree
    os.remove(fullname)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\apogne\\appdata\\local\\temp\\tmpadxbvj.webdriver.xpi\\platform\\WINNT_x86-msvc\\components\\webdriver-firefox-previous.dll'

Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
    self.run()
  File "C:\Python27\lib\threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "C:\Python27\pinterest\user_info_multiThread.py", line 21, in gettingUserInfo
    driver = webdriver.Firefox()
  File "C:\Python27\lib\selenium\webdriver\firefox\webdriver.py", line 61, in __init__
    keep_alive=True)
  File "C:\Python27\lib\selenium\webdriver\remote\webdriver.py", line 73, in __init__
    self.start_session(desired_capabilities, browser_profile)
  File "C:\Python27\lib\selenium\webdriver\remote\webdriver.py", line 121, in start_session
    'desiredCapabilities': desired_capabilities,
  File "C:\Python27\lib\selenium\webdriver\remote\webdriver.py", line 173, in execute
    self.error_handler.check_response(response)
  File "C:\Python27\lib\selenium\webdriver\remote\errorhandler.py", line 166, in check_response
    raise exception_class(message, screen, stacktrace)
WebDriverException: Message: u'c is null' ; Stacktrace: 
    at nsCommandProcessor.prototype.newSession (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/command-processor.js:11751:61)
    at nsCommandProcessor.prototype.execute (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/command-processor.js:11646:7)
    at Dispatcher.executeAs/< (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/driver-component.js:8430:5)
    at Resource.prototype.handle (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/driver-component.js:8577:219)
    at Dispatcher.prototype.dispatch (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/driver-component.js:8524:36)
    at WebDriverServer/<.handle (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/driver-component.js:11466:5)
    at createHandlerFunc/< (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/httpd.js:1935:41)
    at ServerHandler.prototype.handleResponse (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/httpd.js:2261:15)
    at Connection.prototype.process (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/httpd.js:1168:5)
    at RequestReader.prototype._handleResponse (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/httpd.js:1616:5)
    at RequestReader.prototype._processBody (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/httpd.js:1464:9)
    at RequestReader.prototype.onInputStreamReady (file:///c:/users/apogne/appdata/local/temp/tmpr0rxvj/extensions/fxdriver@googlecode.com/components/httpd.js:1333:9) 

Exception in thread Thread-4:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
    self.run()
  File "C:\Python27\lib\threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "C:\Python27\pinterest\user_info_multiThread.py", line 24, in gettingUserInfo
    driver.get("http://www.pinterest.com")
  File "C:\Python27\lib\selenium\webdriver\remote\webdriver.py", line 185, in get
    self.execute(Command.GET, {'url': url})
  File "C:\Python27\lib\selenium\webdriver\remote\webdriver.py", line 171, in execute
    response = self.command_executor.execute(driver_command, params)
  File "C:\Python27\lib\selenium\webdriver\remote\remote_connection.py", line 349, in execute
    return self._request(command_info[0], url, body=data)
  File "C:\Python27\lib\selenium\webdriver\remote\remote_connection.py", line 380, in _request
    resp = self._conn.getresponse()
  File "C:\Python27\lib\httplib.py", line 1067, in getresponse
    response.begin()
  File "C:\Python27\lib\httplib.py", line 409, in begin
    version, status, reason = self._read_status()
  File "C:\Python27\lib\httplib.py", line 373, in _read_status
    raise BadStatusLine(line)
BadStatusLine: ''

Exception in thread Thread-3:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
    self.run()
  File "C:\Python27\lib\threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "C:\Python27\pinterest\user_info_multiThread.py", line 24, in gettingUserInfo
    driver.get("http://www.pinterest.com")
  File "C:\Python27\lib\selenium\webdriver\remote\webdriver.py", line 185, in get
    self.execute(Command.GET, {'url': url})
  File "C:\Python27\lib\selenium\webdriver\remote\webdriver.py", line 171, in execute
    response = self.command_executor.execute(driver_command, params)
  File "C:\Python27\lib\selenium\webdriver\remote\remote_connection.py", line 349, in execute
    return self._request(command_info[0], url, body=data)
  File "C:\Python27\lib\selenium\webdriver\remote\remote_connection.py", line 380, in _request
    resp = self._conn.getresponse()
  File "C:\Python27\lib\httplib.py", line 1067, in getresponse
    response.begin()
  File "C:\Python27\lib\httplib.py", line 409, in begin
    version, status, reason = self._read_status()
  File "C:\Python27\lib\httplib.py", line 373, in _read_status
    raise BadStatusLine(line)
BadStatusLine: ''

Exception in thread Thread-5:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
    self.run()
  File "C:\Python27\lib\threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "C:\Python27\pinterest\user_info_multiThread.py", line 24, in gettingUserInfo
    driver.get("http://www.pinterest.com")
  File "C:\Python27\lib\selenium\webdriver\remote\webdriver.py", line 185, in get
    self.execute(Command.GET, {'url': url})
  File "C:\Python27\lib\selenium\webdriver\remote\webdriver.py", line 171, in execute
    response = self.command_executor.execute(driver_command, params)
  File "C:\Python27\lib\selenium\webdriver\remote\remote_connection.py", line 349, in execute
    return self._request(command_info[0], url, body=data)
  File "C:\Python27\lib\selenium\webdriver\remote\remote_connection.py", line 380, in _request
    resp = self._conn.getresponse()
  File "C:\Python27\lib\httplib.py", line 1067, in getresponse
    response.begin()
  File "C:\Python27\lib\httplib.py", line 409, in begin
    version, status, reason = self._read_status()
  File "C:\Python27\lib\httplib.py", line 373, in _read_status
    raise BadStatusLine(line)
BadStatusLine: ''

有谁知道为什么会出现这种错误,我该如何解决?

1 个答案:

答案 0 :(得分:0)

你应该为

创建一个锁
driver=webdriver.Firefox()

这样一次只有一个线程可以引导驱动程序

编辑:

from selenium import webdriver
from threading import Thread, Lock

def f():    
    #thread will either acquire lock or wait for it to be released by other thread
    with my_lock:
        #init this driver
        driver = webdriver.Firefox()    

    #do your other stuff

    driver.close()

thread_list=[]
my_lock = Lock()

for _ in xrange(5):
    t = Thread( target=f )    
    t.start()    
    thread_list.append( t )

for t in thread_list:    
    t.join()