我正在尝试使用ipython的并行处理并行处理数据。我按照@minrk的说明回答how to get intermidiate results in ipython parallel processing?上的问题。由于数据是异构的,因此某些处理任务比其他任务更早完成,我希望它们一旦可用就立即保存。我这样做的方式如下:
from IPython.parallel import Client
def specialfunc(param):
import time
if param > 8:
raise IOError
else:
time.sleep( param)
return param
client = Client()
balanced = client.load_balanced_view()
balanced.block = False
param_list = range(10) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
asyncmap = balanced.map_async(specialfunc, param_list, ordered=False)
然后我可以遍历asyncmap,结果在准备就绪时可用:
for i in asyncmap:
print i
问题是我的代码有时会抛出异常(上面的例子强制调用参数超过8时的IOError),我想处理它。然而,只要其中一个引擎摇摆不定,整个异步映射“看起来”就完成了。
我实际上注意到,当我询问asyncmap.metadata时,可以很好地找出哪条消息发出了错误(asyncmap.metadata [i] ['pyerr']),但后来我不知道如何等待结果像他们一样进来。
所以我的问题是我应该如何处理从引擎异步到达的结果,即使它们有时会抛出异常。如何在不破坏控制器中等待结果的情况下捕获引擎中的异常?
答案 0 :(得分:1)
我知道这听起来有些愚蠢,但您可以返回一个特殊值来表示错误,比如-1
或None
或字符串。绕过map_async
我所做的就是遍历参数并使用apply_async
,将结果存储在列表中。然后,我遍历列表尝试获取结果并一次处理一个。看起来像这样:
n_cores = len(c.ids)
for n,p in enumerate( params ):
core = c.ids[n%n_cores]
calls.append( c[core].apply_async( f, p ) )
#then you get the results
while calls != []:
for c in calls:
try:
result = c.get(1e-3)
process(result)
calls.remove( c )
#in the case your call failed, you can apply_async again.
# and append the call to calls.
except parallel.TimeoutError:
pass
或者使用c[core].apply()
并使用c.ready()
检查通话。基本上同样的事情没有异常处理。令人讨厌的是,由于results
和其他dict
的每个电话都难以清除,因此会占用大量内存。
我正在做类似的事情here我决定map_async对我没用。 This也可能是相关的,如果您决定采用这种方法。
干杯。
PS:我认为这基本上就是你在上面实现的,但是我觉得单独处理这些调用然后将它们堆叠到地图中更自然,特别是如果你以后想要重新处理其中的一些。答案 1 :(得分:0)
受ipython/*/examples/parallel/customresults.py的启发,我提出了这个解决方案:
asyncmap = balanced.map(specialfunc, param_list, ordered=False)
#create original mapping of msg_ids to parameters
# maybe just a quick way to find which parameter gave what result
msg_ids_to_parameters = dict(zip(asyncmap.msg_ids, param_list))
pending = set(asyncmap.msg_ids) # all queued jobs are pending
while pending: # we'll come back as long as finished jobs haven't been looked at yet
try:
client.wait(pending, 1e-3)
except parallel.TimeoutError:
# ignore timeouterrors, since they only mean that at least one isn't done
pass
# finished is the set of msg_ids that are complete
finished = pending.difference(client.outstanding)
# update pending to exclude those that just finished
pending = pending.difference(finished)
for msg_id in finished:
# we know these are done, so don't worry about blocking
ar = client.get_result(msg_id)
# checking whether any exceptions occurred when code ran on the engine
if ar.metadata['pyerr'] is None:
print "job id %s finished on engine %i " % (msg_id, ar.engine_id)
print "and results for parameter %i :" % msg_ids_to_parameters[msg_id]
# note that each job in a map always returns a list of length chunksize
# even if chunksize == 1
for res in ar.result:
print " item %i \n" % res
else:
print('this went wrong for %i (%s)' % (msg_ids_to_parameters[msg_id], ar.metadata['pyerr']))
基本上,示例代码的变化是查看元数据并查看是否记录了错误,并且只有在未通过ar.result
检索结果时才会记录错误。