我正在研究并行代码。在我的主要功能中,我会随着时间的流逝而循环,一开始,我需要使用赋值运算符复制该类。但是以某种方式在第4步中,其中一个处理器出现了double free或constant错误,而其他处理器则正常。以及std :: set和set :: map上的错误。下面是代码和主循环的一部分。
class Mesh
{
public:
const Mesh &operator=(const Mesh &mesh);
std::set<size_t> ghostSet;
std::map<size_t, size_t> localIndex;
}
分配运算符:
const Mesh &operator=(const Mesh &mesh)
{
std::set<size_t>().swap(ghostSet); ///BUG here
std::map<size_t, size_t>().swap(localIndex); /// BUG sometimes here
for(auto const &it : mesh.localIndex)
localIndex[it.first] = it.second;
for(auto const &it : mesh.ghostSet)
ghostSet.insert(it);
return *this;
}
主要功能:
int main(int argc, char *argv[])
{
Mesh ms, ms_gh;
/// Some operation to ms;
for(size_t t = 0; t != 10; t++)
{
/// Some operation to ms;
ms_gh = ms;
/// Some operation to ms_gh;
}
}
#0 0x00002aaab2405207 in raise () from /lib64/libc.so.6
#1 0x00002aaab24068f8 in abort () from /lib64/libc.so.6
#2 0x00002aaab2447cc7 in __libc_message () from /lib64/libc.so.6
#3 0x00002aaab2450429 in _int_free () from /lib64/libc.so.6
#4 0x000000000041bfba in __gnu_cxx::new_allocator<std::_Rb_tree_node<unsigned long> >::deallocate (this=07fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/ext/new_allocator.h:110
#5 0x000000000041835c in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_put_node (this=0x7fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:374
#6 0x000000000041276e in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_destroy_node (this=0x7fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:422
#7 0x000000000040c8ad in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1127
#8 0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x72f410)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
#9 0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x72b760)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
#10 0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x70fce0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
#11 0x00000000004080c4 in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::~_Rb_tree (this=0x7fffffff8b50, __in_chrg=<optimized ut>)
at /usr/include/c++/4.8.2/bits/stl_tree.h:671
#12 0x0000000000407bbc in std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> ::~set (this=0x7fffffff8b50,
__in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_set.h:90
#13 0x0000000000405003 in Mesh::operator= (this=0x7fffffffa8a0, mesh=...)
at mesh.cpp:73
#14 0x000000000048eb98 in DynamicMesh::reattach_ghost (mpi_comm=1140850688,
ms=..., cn=..., ms_gh=..., gh=..., cn_gh=..., ale=..., t=4)
at dynamicMesh.cpp:273
在这种情况下,追溯#13对应于交换std :: set。
我的问题是,为什么这样的错误没有在第一步出现,为什么没有在所有处理器上出现。此外,有时会在与std :: map相关的行中发生此错误。
此外,在我的macOS和Linux笔记本电脑上,代码可以成功运行;但不适用于HPC。
答案 0 :(得分:1)
太复杂了!步骤1:std::set
和std::map
都具有clear
函数,因此无需与空的临时对象交换:
/* const*/ Mesh& Mesh::operator=(Mesh const& other)
// why return const? 'this' isn't const either;
// if at all, you only prevent using it directly afterwards:
// Mesh x, y;
// (x = y).someNonConstFunction();
{
//std::set<size_t>().swap(ghostSet); ///BUG here
//std::map<size_t, size_t>().swap(localIndex); /// BUG sometimes here
localIndex.clear();
for(auto const &it : other.localIndex)
localIndex[it.first] = it.second;
ghostSet.clear();
for(auto const &it : other.ghostSet)
ghostSet.insert(it);
}
对上面的清除进行重新排序仅是为了更好地说明步骤2:std::map
和std::set
已经提供了赋值运算符,它们确实执行清除和复制循环的作用:
Mesh& Mesh::operator=(Mesh const& other)
{
//localIndex.clear();
//for(auto const &it : other.localIndex)
// localIndex[it.first] = it.second;
localIndex = other.localIndex;
//ghostSet.clear();
//for(auto const &it : other.ghostSet)
// ghostSet.insert(it);
ghostSet = other.ghostSet;
// now fixing as well:
return *this;
}
第3步:现在上面的运算符完全按照默认赋值运算符的方式进行操作,仅默认值按声明成员的顺序进行赋值,因此先分配集合,然后分配映射。假设分配顺序无关紧要,您最终将获得:
class Mesh
{
Mesh& Mesh::operator=(Mesh const& other) = default;
};
我正在研究并行代码[...]
请注意,在任何情况下,分配都不是线程安全的(带循环的原始代码也不是)。您的双重删除问题很可能仅是由于同时访问集合或地图导致的。您将必须保护您的地图,使其在操作员仍处于活动状态时(例如, G。通过互斥锁。
您没有两个选择:通过访问互斥量(无论是getter还是获取),使类本身成为线程安全的,但是,通过引用或指针返回任何内容然后变得不安全,因为锁不会吸气剂退出后立即被关押。无论如何,如果按值返回,没问题。
另一个变体是将正确的线程同步留给用户,这避免了上述问题,因为它将在获取引用之前锁定互斥锁,在引用尚未使用时保持互斥锁,然后才释放它。 / p>
上述方法可以通过读/写锁进行改进,其中只有在对象被修改(新的项目如上述那样添加到映射或集合或赋值)中时,才会保留读/写锁。关键是修改单个元素–除非这些元素自己提供互斥锁或类似元素,或者可以原子地修改(或使用一些无锁算法),否则也需要保持写锁定。