Question

我正在计算每年相对计算密集的数据。我使用了numba（效果很好）来减少运行迭代来计算数据所花费的时间。然而，鉴于我有20年的独立数据，我想将它们分成5个4组，可以运行4个不同的cpu核心。

def compute_matrices(self):
    for year in self.years:
         self.xs[year].compute_matrix()

在上面的代码片段中，该函数是Class中包含属性year和xs的方法。 year只是一个整数年，而xs是一个包含xs.data和compute_matrix（）方法的横截面对象。

在多个核心之间拆分这个的最简单方法是什么？

如果有一个Numba风格的decorater可以自动分解循环并在不同的进程中运行它们并将结果粘合在一起，那将会很棒。这是否存在？
使用Python.multiprocessing是我最好的选择吗？

Answer 1

因此，您可以考虑以下几点：

NumbaPro：https://store.continuum.io/cshop/accelerate/。这基本上是类固醇的Numba，为许多和多核架构提供支持。不幸的是它并不便宜。

Numexpr：https://code.google.com/p/numexpr/。这是实现超线程的numpy数组的表达式计算器。

Numexpr-Numba（实验性）：https://github.com/gdementen/numexpr-numba。顾名思义，这是使用Numba后端的Numexpr。

很多答案取决于compute_matrix方法中的操作。

最快（就开发时间而言）解决方案可能只是使用multiprocessing库分割您的计算。应该注意的是，如果compute_matrix函数没有副作用，那么使用多处理会更容易。

Answer 2

我遇到的复杂对象最简单的方法是利用IPython并行计算引擎。

只需使用以下代码运行Ipython群集：ipcluster start -n 4或使用笔记本

然后，您可以迭代分配给不同客户端的xs对象。

def multicore_compute_matrices(self):
    from IPython.parallel import Client
    c = Client()
    xs_list = []
    years = sorted(self.years)
    # - Ordered List of xs Objects - #
    for year in years
         xs_list.append(self.xs[year])
    # - Compute across Clusters - #
    results = c[:].map_sync(lambda x: x.compute_matrix(), xs_list)
    # - Assign Results to Current Object - #
    year = years[0]
    for result in results:
        self.xs[year].matrix = result
        year += 1

壁垒时间%time结果：

%time A.compute_matrices()
Wall Time: 5.53s

%time A.multicore_compute_matrices():
Wall Time: 2.58s

在python中通过多个cpu核心运行简单循环（适用于不同数据）的最简单方法？

2 个答案: