多进程的python请求

时间:2018-06-01 09:14:02

标签: python python-requests python-multiprocessing multiprocess

我正在开发一些漏洞管理工具的问题。我应该将get请求发送到远程API。我正在使用多进程来执行并行计算。问题是脚本永远不会终止执行。它执行一些请求然后被阻止。这是我的一段代码

import requests
from multiprocess import Pool

def check_vulnerability(package):
  req = ''
  while req == '':
     try:
       headers = {'User-agent' : 'Mozilla/11.0'}
       time.sleep(0.3)
       req = requests.get('https://access.redhat.com/labs/securitydataapi/cve.xml?package='+package, headers = headers)
       break
     except:
       print "Retrying . . ."
       #time.sleep(0.3)
       continue
  soup = BeautifulSoup(req.text, 'xml')
  #some code to process soup and print partial results

def main():
  start_time = time.time()
  packages = fetch_packages()  #list of strings
  p = Pool(int(results.thread)) #from argv
  all = p.map(check_vulnerability, packages)
  print( "\n"+"Finished in : " + str( int( time.time() - start_time ) ) + "s")

if __name__=="__main__":
  main()

执行此操作时,它会打印多行(部分结果)但从未完成执行。当我按ctrl + C时打印重试,这意味着它在发送请求时被阻止。然后,打印以下跟踪

Traceback (most recent call last):
 File "/home/user/.local/lib/python2.7/site- 
 packages/multiprocess/process.py", line 258, in _bootstrap
 self.run()
 File "/home/user/.local/lib/python2.7/site- 
 packages/multiprocess/process.py", line 114, in run
 self._target(*self._args, **self._kwargs)
 File "/home/user/.local/lib/python2.7/site- 
 packages/multiprocess/pool.py", line 102, in worker
 task = get()
 File "/home/user/.local/lib/python2.7/site- 
 packages/multiprocess/queues.py", line 379, in get
 racquire()
 KeyboardInterrupt

我认为问题是因为我试图向远程服务器发送许多请求。我该怎么办 ?任何形式的帮助表示赞赏。谢谢

编辑1 我删除了while循环,我试图捕获异常 它是

HTTPSConnectionPool(host='access.redhat.com', port=443): Max retries exceeded with url: /labs/securitydataapi/cve.xml?package=librelp-1.2.0-3.el7.i686 (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))

我尝试使用Session()但同样的问题

编辑2 我认为问题出在我的p.map中,因为我已经阻止了线程,我希望我的脚本将继续执行,即使存在被阻塞的线程或者我不希望任何线程被阻塞(通常等待ssl_handshake) 当我尝试使用imap时,它完成执行而不处理任何数据包(使用地图它开始处理一些数据包然后被阻止)

1 个答案:

答案 0 :(得分:0)

关于多重处理的问题,请尝试以下操作:

from multiprocessing.dummy import Pool as ThreadPool 

pool = ThreadPool(10) # say
all = pool.map(check_vulnerability, packages)

这个包中有一些变化。关于您的SSL问题,如果您对检查证书不感兴趣,可以在request.get

中传递verify = False参数。
req = requests.get('https://access.redhat.com/labs/securitydataapi/cve.xml?package='+package, headers = headers, verify=False)