我想执行以下操作:
我曾尝试将this和this的答案结合在一起,但收效甚微。 第二个队列的代码永远不会被调用,因此不会发生磁盘写操作。如何让进程知道第二个队列?
请注意,我不一定是 File "/usr/local/lib/python3.4/dist-packages/httplib2/__init__.py", line 1533, in _conn_request
response = conn.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1208, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 380, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 342, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.4/socket.py", line 374, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.4/ssl.py", line 769, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.4/ssl.py", line 641, in read
v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
的粉丝。如果multiprocessing
/ async
工作得更好,我全力以赴。
到目前为止我的代码
await
答案 0 :(得分:2)
我在尝试执行您的代码时遇到的第一个问题是:
An attempt has been made to start a new process before the current process has finished
its bootstrapping phase. This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom in the main module
我必须将所有模块作用域指令包装在if __name__ == '__main__':
惯用语中。 Read more here。
由于您的目标是遍历文件的各行,因此Pool.imap()
似乎很合适。 imap()
文档是指map()
文档,不同之处在于imap()
懒惰地从可迭代对象(在您的情况下将为csv文件)中提取下一个项目,如果您的csv文件很大。因此,来自map()
文档:
此方法将迭代器切成许多块, 作为单独的任务提交到流程池。
imap()
返回一个迭代器,这样您就可以对流程工作者产生的结果进行迭代,以对它们进行处理(在您的示例中,是将结果写入文件中)
这是一个有效的示例:
import multiprocessing
import os
import time
def worker_main(item):
print(os.getpid(), "got", item)
time.sleep(1) #long network processing
print(os.getpid(), "done", item)
# put the processed items to be written to disl
return "processed:" + str(item)
if __name__ == '__main__':
with multiprocessing.Pool(3) as pool:
with open('out.txt', 'w') as file:
# range(5) simulating a 5 row csv file.
for proc_row in pool.imap(worker_main, range(5)):
file.write(proc_row + '\n')
# printed output:
# 1368 got 0
# 9228 got 1
# 12632 got 2
# 1368 done 0
# 1368 got 3
# 9228 done 1
# 9228 got 4
# 12632 done 2
# 1368 done 3
# 9228 done 4
out.txt
看起来像这样:
processed:0
processed:1
processed:2
processed:3
processed:4
请注意,我也不必使用任何队列。