我更改了我的问题,因为我不能再问另一个问题,从要发送和接收的变量到下面的变量。希望这是一个更合适的问题。
我正在尝试收集在处理器中拆分的数据,但是只有四分之一可以正确返回。我正在为每个处理器分配(num loops)/(num procs)个循环数(用户被迫使用整数个处理器作为迭代数量)。当使用5个处理器和200次迭代时,每个从属设备都可以正确计算所有40个值,但是以某种方式,只有四分之一可以正确地将其返回给主设备。我将其扩展到400次迭代,但仍然只有四分之一正确地返回给主服务器。我想知道是否需要等待或确保它完成从一个处理器读取到下一个处理器的读取?代码的MPI部分在下面,所有数学均已删除。
#define MASTER 0
MPI_Init(NULL,NULL);
MPI_Status status;
int rank,size,name_len;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Get_processor_name(processor_name, &name_len);
chunksize = (NumEnergies / size);
if (rank == MASTER)
{
offset = chunksize;
for (dest=1; dest<size; dest++)
{
MPI_Send(&offset, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD);
offset = offset + chunksize;
}
//master does its calcs for CrossSections and DiffCrossSections
for (int i=1; i<size; i++)
{
source = i;
MPI_Recv(&offset, 1, MPI_INT, source, tag1, MPI_COMM_WORLD, &status);
MPI_Recv(&CrossSections[offset][1], chunksize, MPI_DOUBLE, source, tag2, MPI_COMM_WORLD, &status);
MPI_Recv(&DiffCrossSections[0][offset+1], (181)*chunksize, MPI_DOUBLE, source, tag3, MPI_COMM_WORLD, &status);
}
}
if (rank > MASTER)
{
/* Receive my portion of array from the master task */
source = MASTER;
MPI_Recv(&offset, 1, MPI_INT, source, tag1, MPI_COMM_WORLD, &status);
std::cout<<"Processor: "<<processor_name<<" rank:"<<rank<<" Energies="<<CrossSections[offset][0]<<" - "<<CrossSections[offset+chunksize-1][0]<<std::endl;
/* Each task does its part of the work */
/* Send task results back to the master task */
dest = MASTER;
MPI_Send(&offset, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD);
MPI_Send(&CrossSections[offset][1], chunksize, MPI_DOUBLE, dest, tag2, MPI_COMM_WORLD);
MPI_Send(&DiffCrossSections[0][offset+1], (181)*chunksize, MPI_DOUBLE, dest, tag3, MPI_COMM_WORLD);
}
MPI_Finalize();
我读出了每个从站中的值,并且所有值都是正确的,尽管当然有些混乱。他们中的3/4发送回0,这就是我将向量初始化的目的。因此,似乎它们只是保留其初始化值,而不是实际上被发送为垃圾。关于什么可能导致它仅正确返回四分之一值的任何提示?每次是四分之一,不会改变。
谢谢!
答案 0 :(得分:0)
我认为问题在于发送如何通过分配给向量的内存。当我使用1D缓冲区传递结果时,它可以很好地工作,这使我感觉要传递的数据根本不在连续的内存块中。因此,也许在分配向量的同时反转索引会使我要访问的内存连续。但是,也许我拥有的方式是“有序对”排列,而不是我真正想要的排列。我必须对DiffCrossSections做同样的事情。
编辑:我将CrossSections和DiffCrossSections转换为它们的初始化,并解决了该问题。因此,看来确实是我在内存中发送数据,我认为这是连续的,但不是。
std::vector<std::vector<double > > DiffCrossSections (NumEnergies+1,std::vector<double>(181,0.0));
std::vector< std::vector<double > > CrossSections (2, std::vector<double> (NumEnergies,0.0));
#define MASTER 0
MPI_Init(NULL,NULL);
MPI_Status status;
int rank,size,name_len;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Get_processor_name(processor_name, &name_len);
chunksize = (NumEnergies / size);
if (rank == MASTER)
{
offset = chunksize;
for (dest=1; dest<size; dest++)
{
MPI_Send(&offset, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD);
offset = offset + chunksize;
}
//master does its calcs for CrossSections and DiffCrossSections
for (int i=1; i<size; i++)
{
source = i;
MPI_Recv(&offset, 1, MPI_INT, source, tag1, MPI_COMM_WORLD, &status);
MPI_Recv(&CorssSections[1][offset], chunksize, MPI_DOUBLE, source, tag2, MPI_COMM_WORLD, &status);
MPI_Recv(&DiffCrossSections[offset+1][0], (181)*chunksize, MPI_DOUBLE, source, tag3, MPI_COMM_WORLD, &status);
}
}
if (rank > MASTER)
{
/* Receive my portion of array from the master task */
source = MASTER;
MPI_Recv(&offset, 1, MPI_INT, source, tag1, MPI_COMM_WORLD, &status);
std::cout<<"Processor: "<<processor_name<<" rank:"<<rank<<" Energies="<<CrossSections[0][offset]<<" - "<<CrossSections[0][offset+chunksize-1]<<std::endl;
/* Each task does its part of the work */
/* Send task results back to the master task */
dest = MASTER;
MPI_Send(&offset, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD);
MPI_Send(&CrossSections[1][offset], chunksize, MPI_DOUBLE, dest, tag2, MPI_COMM_WORLD);
MPI_Send(&DiffCrossSections[offset+1][0], (181)*chunksize, MPI_DOUBLE, dest, tag3, MPI_COMM_WORLD);
}
MPI_Finalize();