Question

我以前从未使用过分布式计算，但是我正在尝试将mpi4py集成到程序中，以并行计算群集上的for循环。

这是我要执行的操作的伪代码：

for file in directory: Initialize a class Run class methods Conglomerate results

我已经查看了堆栈溢出的全部内容，但找不到任何解决方案。是否可以使用mpi4py做到这一点，还是有另一种工具可以轻松安装和设置呢？

Answer 1

为了使用MPI4Py实现for循环的并行性，请检查以下代码示例。它只是一个for循环，用于添加一些数字。 for循环将在每个节点中执行。每个节点将获得不同的数据块（在for循环中）。最终排名为零的Node将添加所有节点的结果。

#!/usr/bin/python

import numpy
from mpi4py import MPI
import time

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

a = 1
b = 1000000

perrank = b//size
summ = numpy.zeros(1)

comm.Barrier()
start_time = time.time()

temp = 0
for i in range(a + rank*perrank, a + (rank+1)*perrank):
    temp = temp + i

summ[0] = temp

if rank == 0:
    total = numpy.zeros(1)
else:
    total = None

comm.Barrier()
#collect the partial results and add to the total sum
comm.Reduce(summ, total, op=MPI.SUM, root=0)

stop_time = time.time()

if rank == 0:
    #add the rest numbers to 1 000 000
    for i in range(a + (size)*perrank, b+1):
        total[0] = total[0] + i
    print ("The sum of numbers from 1 to 1 000 000: ", int(total[0]))
    print ("time spent with ", size, " threads in milliseconds")
    print ("-----", int((time.time()-start_time)*1000), "-----")

为了执行上面的代码，您应该像这样运行它：

$ qsub -q qexp -l select = 4：ncpus = 16：mpiprocs = 16：ompthreads = 1 -I＃Salomon：ncpus = 24：mpiprocs = 24
  $ ml Python
  $ ml OpenMPI
  $ mpiexec -bycore -bind-to-core python hello_world.py

在此示例中，我们在4个节点上运行启用MPI4Py的代码，每个节点16个内核（总共64个进程），每个python进程都绑定到不同的内核。

可能帮助您的来源：
Submit job with python code (mpi4py) on HPC cluster
https://github.com/JordiCorbilla/mpi4py-examples/tree/master/src/examples/matrix%20multiplication

使用mpi4py并行化计算群集上的“ for”循环

1 个答案: