Question

我为我的Raspberry Pi集群用C ++编写了一个MPI代码，它生成了Mandelbrot集的图像。发生的事情是在每个节点（不包括主设备，处理器0）上计算Mandelbrot集的一部分，导致每个节点都有一个二维的int数组，指示每个xy点是否在集合中。

它似乎在每个节点上单独运行，但是当使用此命令将所有阵列收集到主节点时： MPI_Gather（＆amp; inside，1，MPI_INT，insideFull，1，MPI_INT，0，MPI_COMM_WORLD）; 它破坏了数据，结果是一个充满垃圾的数组。（里面是节点的部分2D节点的数组。insideFull也是一个2D数组，但它保存整个集合）为什么会这样做？

（这导致我想知道它是否腐败，因为主人没有将其阵列发送给自己（或者至少我不想要它）。所以我的问题的一部分也是有一个MPI_Gather变种没有从根进程发送任何内容，只收集其他所有内容？）

由于

编辑：这是整个代码。如果有人能够提出更好的方式来传输数组，请说。

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>


// ONLY USE MULTIPLES OF THE NUMBER OF SLAVE PROCESSORS
#define ImageHeight  128
#define ImageWidth   128

double MinRe = -1.9;
double MaxRe = 0.5;
double MinIm = -1.2;
double MaxIm = MinIm + (MaxRe - MinRe)*ImageHeight / ImageWidth;

double Re_factor = (MaxRe - MinRe) / (ImageWidth - 1);
double Im_factor = (MaxIm - MinIm) / (ImageHeight - 1);
unsigned n;

unsigned MaxIterations = 50;

int red;
int green;
int blue;

// MPI variables ****
int processorNumber;
int processorRank;
//*******************//

int main(int argc, char** argv) {

  // Initialise MPI
  MPI_Init(NULL, NULL);

  // Get the number of procesors
  MPI_Comm_size(MPI_COMM_WORLD, &processorNumber);

  // Get the rank of this processor
  MPI_Comm_rank(MPI_COMM_WORLD, &processorRank);

  // Get the name of this processor
  char processorName[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processorName, &name_len);

  // A barrier just to sync all the processors, make timing more accurate
  MPI_Barrier(MPI_COMM_WORLD);

  // Make an array that stores whether each point is in the Mandelbrot Set
  int inside[ImageWidth / processorNumber][ImageHeight / processorNumber];

  if(processorRank == 0) {
    printf("Generating Mandelbrot Set\n");
  }

  // We don't want the master to process the Mandelbrot Set, only the slaves
  if(processorRank != 0) {
    // Determine which coordinates to test on each processor
    int xMin = (ImageWidth / (processorNumber - 1)) * (processorRank - 1);
    int xMax = ((ImageWidth / (processorNumber - 1)) * (processorRank - 1)) - 1;
    int yMin = (ImageHeight / (processorNumber - 1)) * (processorRank - 1);
    int yMax = ((ImageHeight / (processorNumber - 1)) * (processorRank - 1)) - 1;

    // Check each value to see if it's in the Mandelbrot Set
    for (int y = yMin; y <= yMax; y++) {
      double c_im = MaxIm - y  *Im_factor;
      for (int x = xMin; x <= xMax; x++) {
        double c_re = MinRe + x*Re_factor;
        double Z_re = c_re, Z_im = c_im;
        int isInside = 1;
        for (n = 0; n <= MaxIterations; ++n) {
          double Z_re2 = Z_re * Z_re, Z_im2 = Z_im * Z_im;
          if (Z_re2 + Z_im2 > 10) {
            isInside = 0;
            break;
          }
          Z_im = 2 * Z_re * Z_im + c_im;
          Z_re = Z_re2 - Z_im2 + c_re;
        }
        if (isInside == 1) {
          inside[x][y] = 1;
        }
        else{
          inside[x][y] = 0;
        }
      }
    }
  }

  // Wait for all processors to finish computing
  MPI_Barrier(MPI_COMM_WORLD);

  int insideFull[ImageWidth][ImageHeight];

  if(processorRank == 0) {
    printf("Sending parts of set to master\n");
  }

  // Send all the arrays to the master
  MPI_Gather(&inside[0][0], 1, MPI_INT, &insideFull[0][0], 1, MPI_INT, 0, MPI_COMM_WORLD);

  // Output the data to an image
  if(processorRank == 0) {
    printf("Generating image\n");
    FILE * image = fopen("mandelbrot_set.ppm", "wb");
    fprintf(image, "P6 %d %d 255\n", ImageHeight, ImageWidth);
    for(int y = 0; y < ImageHeight; y++) {
      for(int x = 0; x < ImageWidth; x++) {
        if(insideFull[x][y]) {
          putc(0, image);
          putc(0, image);
          putc(255, image);
        }
        else {
          putc(0, image);
          putc(0, image);
          putc(0, image);
        }
        // Just to see what values return, no actual purpose
        printf("%d, %d, %d\n", x, y, insideFull[x][y]);
      }
    }
    fclose(image);
    printf("Complete\n");
  }

  MPI_Barrier(MPI_COMM_WORLD);

  // Finalise MPI
  MPI_Finalize();
}

Answer 1

使用以下参数调用MPI_Gether：

  const void* sendbuf : &inside[0][0]      Starting address of send buffer
  int sendcount : 1                        Number of elements in send buffer
  const MPI::Datatype& sendtype : MPI_INT  Datatype of send buffer elements
  void* recvbuf : &insideFull[0][0]
  int recvcount : 1                        Number of elements for any single receive
  const MPI::Datatype& recvtype : MPI_INT  Datatype of recvbuffer elements
  int root : 0                             Rank of receiving process
  MPI_Comm comm : MPI_COMM_WORLD           Communicator (handle).

仅发送/接收一个元素是不够的。而不是1使用

 (ImageWidth / processorNumber)*(ImageHeight / processorNumber)

然后考虑源和目标2D阵列的不同内存布局：

 int inside[ImageWidth / processorNumber][ImageHeight / processorNumber];

VS。

int insideFull[ImageWidth][ImageHeight];

由于副本是内存块副本，而不是智能2D阵列副本，因此无论行的大小如何，所有源整数都将连续转移到目标地址。

我建议将数据fisrt发送到与源相同大小的数组中，然后在接收过程中将元素复制到右边的行＆amp;完整数组中的列，例如具有小函数，如：

// assemble2d(): 
// copys a source int sarr[sli][sco] to a destination int darr[dli][sli] 
// using an offset to starting at darr[doffli][doffco].  
// The elements that are out of bounds are ignored.  Negative offset possible. 
void assemble2D(int*darr, int dli, int dco, int*sarr, int sli, int sco, int doffli=0, int doffco=0)
{
    for (int i = 0; i < sli; i++) 
        for (int j = 0; j < sco; j++)
            if ((i + doffli >= 0) && (j + doffco>=0) && (i + doffli<dli) && (j + doffco<dco))
                darr[(i+doffli)*dli + j+doffco] = sarr[i*sli+j];
}

MPI收集腐败阵列

1 个答案: