我正在尝试使用多重处理来加快AWS Lambda Python中的解析器的速度。
它在本地运行良好,但是当我在lambda上尝试时,出现以下错误。为什么会出现此错误以及如何解决?
{ "errorMessage": "[Errno 24] Too many open files", "errorType":
"OSError", "stackTrace": [
" File \"/var/task/lambda_function.py\", line 64, in lambda_handler\n p.start()\n",
" File \"/var/lang/lib/python3.7/multiprocessing/process.py\", line 112, in start\n self._popen = self._Popen(self)\n",
" File \"/var/lang/lib/python3.7/multiprocessing/context.py\", line 223, in _Popen\n return
_default_context.get_context().Process._Popen(process_obj)\n",
" File \"/var/lang/lib/python3.7/multiprocessing/context.py\", line 277, in _Popen\n return Popen(process_obj)\n",
" File \"/var/lang/lib/python3.7/multiprocessing/popen_fork.py\", line 20, in __init__\n self._launch(process_obj)\n",
" File \"/var/lang/lib/python3.7/multiprocessing/popen_fork.py\", line 69, in _launch\n parent_r, child_w = os.pipe()\n" ] }
我的代码如下:
def parse(item, L):
r = requests.get(item[0], headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3'},
cookies={'name': 'Parser',
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'})
if (r.status_code == 200):
if item[1] not in L:
L.append([r.text, r.status_code, item[1]])
else:
L.append([None, r.status_code, item[1]])
def lambda_handler(event, context):
# Make DB Connection
with Manager() as manager:
L = manager.list() # <-- can be shared between processes.
processes = []
for item in sqlSelectVars:
p = multiprocessing.Process(target=parse, args=(item, L))
processes.append(p)
p.start()
for process in processes:
process.join()
# Commit my parsed values to the DB with PyMySQL
try:
cursor.executemany(sqlUpdateRawHtml, L)
cursor.connection.commit()
答案 0 :(得分:0)
只需根据您的描述。我想原因是您创建了太多的流程。
从Lambda文档AWS Lambda Limits中,我们可以获得:
文件描述符:1,024
执行进程/线程:1,024
因此,这部分需要更改:
for item in sqlSelectVars:
p = multiprocessing.Process(target=parse, args=(item, L))
另一方面,您想“使用多重处理来加快解析器的速度”。实际上,如果您只是在parse
中进行计算,那么太多的进程将毫无用处。取而代之的是,流程创建需要更多开销。