Input (a.txt) contains data as:
{person1: [www.person1links1.com]}
{person2: [www.person2links1.com,www.person2links2.com]}...(36000 lines of such data)
我有兴趣从每个人的个人链接中提取数据,我的代码是:
def get_bio(authr,urllist):
author_data=[]
for each in urllist:
try:
html = urllib.request.urlopen(each).read()
author_data.append(html)
except:
continue
f=open(authr+'.txt','w+')
for each in author_data:
f.write(str(each))
f.write('\n')
f.write('********************************************')
f.write('\n')
f.close()
if __name__ == '__main__':
q=mp.Queue()
processes=[]
with open('a.txt') as f:
for each in f:
q.put(each)# dictionary
while (q.qsize())!=0:
for authr,urls in q.get().items():
p=mp.Process(target=get_bio,args=(authr,urls))
processes.append(p)
p.start()
for proc in processes:
proc.join()
我在运行此代码时遇到以下错误(我尝试设置ulimit但遇到相同的错误):
OSError: [Errno 24] Too many open files: 'personx.txt'
Traceback (most recent call last):
File "perbio_mp.py", line 88, in <module>
p.start()
File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.5/multiprocessing/context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 66, in _launch
parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files
请指出我的错误之处以及如何纠正。 谢谢
答案 0 :(得分:0)
检查操作系统的文件描述符的最大数量。某些版本的macosx具有256个文件的离散限制,例如El Capitan 10.10
无论如何,您都可以运行以下命令:
ulimit -n 4096
在运行python代码之前。
如果您的代码仍然中断,请检查被称为代码方法def get_bio(authr,urllist)
的次数。可能发生的是,循环打开的文件超出了操作系统的处理能力。
答案 1 :(得分:0)
urlopen
返回包装打开文件的响应对象。您的代码没有关闭这些文件,因此出现了问题。
响应对象也是context manager,所以不是
html = urllib.request.urlopen(each).read()
author_data.append(html)
你可以做
with urllib.request.urlopen(each) as response:
author_data.append(response.read())
以确保读取后关闭文件。
另外,正如民谣在评论中观察到的那样,您应该将活动进程的数量减少到合理的数量,因为每个进程都会在操作系统级别打开文件。