pathos.multiprocessing
is known to have advantage over multiprocessing
library in Python in the sense that the former uses dill
instead of pickle
and can serialize wider range of functions and other things.
But when it comes to writing pool.map()
results to file line-wise using pathos
, there comes up some trouble. If all processes in ProcessPool
write results line-wise into a single file, they would interfere to each other writing some lines simultaneously and spoiling the job. In using ordinary multiprocessing
package, I was able to make processes write to their own separate files, named with the current process id, like this:
example_data = range(100)
def process_point(point):
output = "output-%d.gz" % mpp.current_process().pid
with gzip.open(output, "a+") as fout:
fout.write('%d\n' % point**2)
Then, this code works well:
import multiprocessing as mpp
pool = mpp.Pool(8)
pool.map(process_point, example_data)
But this code doesn't:
from pathos import multiprocessing as mpp
pool = mpp.Pool(8)
pool.map(process_point, example_data)
and throws AttributeError
:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-a6fb174ec9a5> in <module>()
----> 1 pool.map(process_point, example_data)
/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.pyc in map(self, func, iterable, chunksize)
128 '''
129 assert self._state == RUN
--> 130 return self.mapAsync(func, iterable, chunksize).get()
131
132 def imap(self, func, iterable, chunksize=1):
/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.pyc in get(self, timeout)
371 return self._value
372 else:
--> 373 raise self._value
374
375 def _set(self, i, obj):
AttributeError: 'module' object has no attribute 'current_process'
There is no current_process()
in pathos
, and I cannot find anything similar to it. Any ideas?
答案 0 :(得分:2)
This simple trick seems to do the job:
import multiprocessing as mp
from pathos import multiprocessing as pathos_mp
import gzip
example_data = range(100)
def process_point(point):
output = "output-%d.gz" % mp.current_process().pid
with gzip.open(output, "a+") as fout:
fout.write('%d\n' % point**2)
pool = pathos_mp.Pool(8)
pool.map(process_point, example_data)
To put differently, one can use pathos
for parallel computation, and ordinary multiprocessing
package for getting id of current process, and this will work correctly!
答案 1 :(得分:2)
我是pathos
作者。虽然你的答案适用于这种情况,但在multiprocessing
pathos
内使用pathos.helpers.mp
的分叉可能会更好,这个分叉位于相当钝的位置:multiprocessing
。
这为您提供了与pathos.helpers.mp.current_process
的一对一映射,但具有更好的序列化。因此,您使用SELECT dateadd(minute, datediff(minute,0,Time) / 15 * 15, 0) AS Time,
round(avg(Amount),1) AS Amount,
round(avg(Amount2),1) AS Amount2
INTO downt
FROM ee3
JOIN ee4
ON DownLine1=DownLine2 --Where Time Null
GROUP BY dateadd(minute, datediff(minute,0,Time) / 15 * 15, 0)
ORDER BY time DESC;
SELECT *
FROM downt;
。
对不起,它没有记录,也没有明显......我应该改善这两个问题中的至少一个。