我有一个程序,目前生成的大型数组和矩阵的大小可超过10GB。该程序使用MPI来并行化工作负载,但受到每个进程需要其自己的数组或矩阵副本以执行其部分计算的事实的限制。大量的MPI进程使内存需求无法实现,因此我一直在研究Boost :: Interprocess作为在MPI进程之间共享数据的方法。
到目前为止,我已经提出了以下内容,它创建了一个大型向量并并行化其元素的总和:
#include <cstdlib>
#include <ctime>
#include <functional>
#include <iostream>
#include <string>
#include <utility>
#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/tuple/tuple_comparison.hpp>
#include <mpi.h>
typedef boost::interprocess::allocator<double, boost::interprocess::managed_shared_memory::segment_manager> ShmemAllocator;
typedef boost::interprocess::vector<double, ShmemAllocator> MyVector;
const std::size_t vector_size = 1000000000;
const std::string shared_memory_name = "vector_shared_test.cpp";
int main(int argc, char **argv) {
int numprocs, rank;
MPI::Init();
numprocs = MPI::COMM_WORLD.Get_size();
rank = MPI::COMM_WORLD.Get_rank();
if(numprocs >= 2) {
if(rank == 0) {
std::cout << "On process rank " << rank << "." << std::endl;
std::time_t creation_start = std::time(NULL);
boost::interprocess::shared_memory_object::remove(shared_memory_name.c_str());
boost::interprocess::managed_shared_memory segment(boost::interprocess::create_only, shared_memory_name.c_str(), size_t(12000000000));
std::cout << "Size of double: " << sizeof(double) << std::endl;
std::cout << "Allocated shared memory: " << segment.get_size() << std::endl;
const ShmemAllocator alloc_inst(segment.get_segment_manager());
MyVector *myvector = segment.construct<MyVector>("MyVector")(alloc_inst);
std::cout << "myvector max size: " << myvector->max_size() << std::endl;
for(int i = 0; i < vector_size; i++) {
myvector->push_back(double(i));
}
std::cout << "Vector capacity: " << myvector->capacity() << " | Memory Free: " << segment.get_free_memory() << std::endl;
std::cout << "Vector creation successful and took " << std::difftime(std::time(NULL), creation_start) << " seconds." << std::endl;
}
std::flush(std::cout);
MPI::COMM_WORLD.Barrier();
std::time_t summing_start = std::time(NULL);
std::cout << "On process rank " << rank << "." << std::endl;
boost::interprocess::managed_shared_memory segment(boost::interprocess::open_only, shared_memory_name.c_str());
MyVector *myvector = segment.find<MyVector>("MyVector").first;
double result = 0;
for(int i = rank; i < myvector->size(); i = i + numprocs) {
result = result + (*myvector)[i];
}
double total = 0;
MPI::COMM_WORLD.Reduce(&result, &total, 1, MPI::DOUBLE, MPI::SUM, 0);
std::flush(std::cout);
MPI::COMM_WORLD.Barrier();
if(rank == 0) {
std::cout << "On process rank " << rank << "." << std::endl;
std::cout << "Vector summing successful and took " << std::difftime(std::time(NULL), summing_start) << " seconds." << std::endl;
std::cout << "The arithmetic sum of the elements in the vector is " << total << std::endl;
segment.destroy<MyVector>("MyVector");
}
std::flush(std::cout);
MPI::COMM_WORLD.Barrier();
boost::interprocess::shared_memory_object::remove(shared_memory_name.c_str());
}
sleep(300);
MPI::Finalize();
return 0;
}
我注意到这导致整个共享对象被映射到每个进程的虚拟内存空间 - 这是我们计算集群的一个问题,因为它将虚拟内存限制为与物理内存相同。有没有办法共享这个数据结构,而不必映射出整个共享内存空间 - 也许是以共享某种指针的形式?尝试访问未映射的共享内存甚至会定义行为吗?不幸的是,我们在阵列上执行的操作意味着每个进程最终都需要访问其中的每个元素(尽管不是同时进行的 - 我认为可以将共享数组分解为多个部分并为您需要的部分交换数组部分,但是这不太理想。)
答案 0 :(得分:0)
由于您要共享的数据非常大,因此将数据视为真实文件并使用文件操作来读取所需数据可能更为实际。然后,您不需要使用共享内存来共享文件,只需让每个进程直接从文件系统中读取。
ifstream file ("data.dat", ios::in | ios::binary);
file.seekg(someOffset, ios::beg);
file.read(array, sizeof(array));