我正在学习MPI编程,我做了一个简单的程序来回传递消息。在消息中我记录了发送和接收时间(以纳秒为单位),我注意到一些奇怪的事情:它发送/接收消息的前几次,有大量的延迟(几十微秒),尽管有更多的发送/收到,那个延迟消失了,只变成1-2微秒。 为什么会这样?
我的程序在具有四个核心的计算机上运行,我用其中两个来调用该程序。我已经创建了一个最小的示例来演示:
vector<size_t> times;
times.reserve(100);
stopwatch s;//Records time since initialization of value
int counter = 0;
if(mpi.world_rank == 0)
{
//Do this if you're on thread 0
for(int i=0;i<20;++i)
{
++counter;
times.push_back(s.age_nano());
//Send counter (size of 1) to thread 1 with tag 0
mpi.send(&counter, 1, 1, 0);
//Receive value (size of 1) from thread 1 with tag 0
mpi.receive(&counter, 1, 1, 0);
}
}
else if(mpi.world_rank == 1)
{
//Otherwise do this if you're on thread 1
for(int i=0;i<20;++i)
{
//Receive value (size of 1) from thread 0 with tag 0
mpi.receive(&counter, 1, 0, 0);
++counter;
times.push_back(s.age_nano());
//Send counter (size of 1) to thread 0 with tag 0
mpi.send(&counter, 1, 0, 0);
}
}
for(int i=times.size(); i > 0; --i) times[i] -= times[i-1];
cout << times << " Counter: " << counter << endl;
当我运行程序时,我得到以下输出:
[Code]$ mpic++ main.cc && mpirun -n 2 a.out
{116, 32276, 1288, 665, 674, 633, 662, 661, 570, 651, 560, 564, 610, 602, 635, 636, 13511, 3080, 449, 473} Counter: 40
{23839, 9402, 908, 662, 668, 651, 652, 592, 635, 586, 593, 575, 632, 612, 632, 7120, 8585, 1435, 442, 450} Counter: 40
如果你注意到,前几个值中的一些值比其他值高很多,其中大部分值在500到700纳秒之间。 mpi.send
和mpi.receive
函数只是MPI_Send
和MPI_Recv
等更标准函数的一个非常轻量级的包装器。这是stopwatch
类的代码:
struct stopwatch
{
typedef decltype(std::chrono::high_resolution_clock::now()) time;
typedef std::chrono::duration<double, std::ratio<1,1>> seconds;
typedef std::chrono::duration<double, std::milli> milliseconds;
typedef std::chrono::duration<double, std::micro> microseconds;
typedef std::chrono::duration<double, std::nano> nanoseconds;
time _start = std::chrono::high_resolution_clock::now();
auto age_nano()
{
return (std::chrono::high_resolution_clock::now() - _start).count();
}
double age_micro()
{
return microseconds(std::chrono::high_resolution_clock::now() - _start).count();
}
double age_milli()
{
return milliseconds(std::chrono::high_resolution_clock::now() - _start).count();
}
double age()
{
return seconds(std::chrono::high_resolution_clock::now() - _start).count();
}
void reset() { _start = std::chrono::high_resolution_clock::now(); }
};
这是我围绕mpi构建的包装器的代码:
#include <mpi.h>
#include <vector>
#include <string>
template<class...> struct get_mpi_type{};
template<class T> struct get_mpi_type<const T> { static constexpr auto type() { return get_mpi_type<T>::type(); } };
template<> struct get_mpi_type<short> { static constexpr auto type() { return MPI_SHORT; }; };
template<> struct get_mpi_type<int> { static constexpr auto type() { return MPI_INT; }; };
template<> struct get_mpi_type<long int> { static constexpr auto type() { return MPI_LONG; }; };
template<> struct get_mpi_type<long long int> { static constexpr auto type() { return MPI_LONG_LONG; }; };
template<> struct get_mpi_type<unsigned char> { static constexpr auto type() { return MPI_UNSIGNED_CHAR; }; };
template<> struct get_mpi_type<unsigned short> { static constexpr auto type() { return MPI_UNSIGNED_SHORT; }; };
template<> struct get_mpi_type<unsigned int> { static constexpr auto type() { return MPI_UNSIGNED; }; };
template<> struct get_mpi_type<unsigned long int> { static constexpr auto type() { return MPI_UNSIGNED_LONG; }; };
template<> struct get_mpi_type<unsigned long long int> { static constexpr auto type() { return MPI_UNSIGNED_LONG_LONG; }; };
template<> struct get_mpi_type<float> { static constexpr auto type() { return MPI_FLOAT; }; };
template<> struct get_mpi_type<double> { static constexpr auto type() { return MPI_DOUBLE; }; };
template<> struct get_mpi_type<long double> { static constexpr auto type() { return MPI_LONG_DOUBLE; }; };
template<> struct get_mpi_type<char> { static constexpr auto type() { return MPI_BYTE; }; };
struct mpi_thread
{
int world_rank;
int world_size;
mpi_thread()
{
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
}
~mpi_thread()
{
MPI_Finalize();
}
template<class T> void send(const T* data, int count, int destination, int tag)
{
MPI_Send(data, count, get_mpi_type<T>::type(), destination, tag, MPI_COMM_WORLD);
}
template<class T> void send(const std::vector<T>& data, int destination, int tag)
{
send(data.data(), data.size(), destination, tag);
}
template<class T> void send(const std::basic_string<T>& str, int destination, int tag)
{
send(str.data(), str.size(), destination, tag);
}
MPI_Status probe(int source, int tag)
{
MPI_Status status;
MPI_Probe(source, tag, MPI_COMM_WORLD, &status);
return status;
}
template<class T> int get_msg_size(MPI_Status& status)
{
int num_amnt;
MPI_Get_count(&status, get_mpi_type<T>::type(), &num_amnt);
return num_amnt;
}
template<class T> void receive(T* data, int count, int source, int tag, MPI_Status& status = *MPI_STATUS_IGNORE)
{
MPI_Recv(data, count, get_mpi_type<T>::type(), source, tag, MPI_COMM_WORLD, &status);
}
template<class T> void receive(std::vector<T>& dest, int source, int tag)
{
MPI_Status status = probe(source, tag);
int size = get_msg_size<T>(status);
dest.clear();
dest.resize(size);
receive(&dest[0], size, source, tag, status);
}
template<class T> void receive(std::basic_string<T>& dest, int source, int tag)
{
MPI_Status status = probe(source, tag);
int size = get_msg_size<T>(status);
dest.clear();
dest.resize(size);
receive(&dest[0], size, source, tag, status);
}
} mpi;
此外,我重载了ostream <<
运算符以打印出向量,但这非常基本。
答案 0 :(得分:1)
如果您想要对MPI进行基准测试,您应该使用众所周知的基准测试,例如(俄亥俄州立大学)OSU基准测试或英特尔IMB。
某些MPI库建立连接&#34; on demand&#34;,这意味着第一次将消息发送到对等体时,需要额外的开销来建立连接。第一次发送给定内存区域时可能会出现一些开销(内存必须注册,并且需要付费)。
众所周知的基准测试通常会在进行实际测量之前进行一些预热迭代,以便从结果中隐藏一次性延迟。