iPython Parallel的地图功能如何执行功能?

时间:2015-08-14 05:18:47

标签: python multithreading parallel-processing ipython

我正在使用ipython的并行功能:

client = Client()
client.direct_view().use_dill()

def hostname():
    import socket
    return socket.gethostname()

# For joblib, we only want one client per host. 
joblib_clients = dict(zip(map(lambda x: client[x].apply_sync(hostname), client.ids), client.ids))
lview_joblib = client.load_balanced_view(targets=joblib_clients.values())
dview = client.direct_view()

我还从本地和每个引擎上的磁盘读取数据

%%px --local

store = pd.HDFStore(data_file, 'r')
rows = store.select('results', ['cv_score_mean > 0'])
rows = rows.sort('cv_score_mean', ascending=False)
rows['results_index'] = rows.index
data_model = store.select('data_model')
p = re.compile('_coef$')
feature_set =  rows.filter(regex='_coef$').dropna(axis=1, how='all').rename(columns=lambda x: p.sub('',x)).columns

然后我在所有引擎上定义并实例化一个类(但不在本地)。

%%px 
class DataRegression(object):
   def __init__(self, **kwargs): ...
       ...
   def regress_z_scores(self, **kwargs):
       ... 
regressions = DataRegression(data_model, feature_set)

通过负载均衡视图,我可以通过几种方式来调用此函数:

1)通过lambda函数:

# What exactly is this lambda function passing to the lview? 
ar = lview_joblib.map(lambda run: regressions.regress_z_scores(run), runs)

2)尝试直接调用该函数:

# This fails with a NameError, because regressions.regress_z_scores is not defined
ar = lview_joblib.map(regressions.regress_z_scores, runs)

3)通过在本地创建回归对象:

%%px --local
class DataRegression(object):
   def __init__(self, **kwargs): ...
       ...
   def regress_z_scores(self, **kwargs):
       ... 
regressions = DataRegression(data_model, feature_set)

# And invoking it through the name. 
ar = lview_joblib.map(regressions.regress_z_scores, runs)
# Does this mean that the local object gets pickled and passed to the client each time? 

在每种情况下,负载均衡视图的map函数如何实际执行此函数调用?有最好的做法吗?

0 个答案:

没有答案