赋值运算符加倍释放或损坏(出)

时间:2018-08-02 09:49:54

标签: c++ mpi assignment-operator

我正在研究并行代码。在我的主要功能中,我会随着时间的流逝而循环,一开始,我需要使用赋值运算符复制该类。但是以某种方式在第4步中,其中一个处理器出现了double free或constant错误,而其他处理器则正常。以及std :: set和set :: map上的错误。下面是代码和主循环的一部分。

    class Mesh
    {
      public:

        const Mesh &operator=(const Mesh &mesh);

        std::set<size_t> ghostSet;
        std::map<size_t, size_t> localIndex;
    }

分配运算符:

    const Mesh &operator=(const Mesh &mesh)
    {
      std::set<size_t>().swap(ghostSet);  ///BUG here
      std::map<size_t, size_t>().swap(localIndex); /// BUG sometimes here
      for(auto const &it : mesh.localIndex)
        localIndex[it.first] = it.second;
      for(auto const &it : mesh.ghostSet)
        ghostSet.insert(it);
      return *this;
    }

主要功能:

    int main(int argc, char *argv[])
    {
      Mesh ms, ms_gh;
      /// Some operation to ms;
      for(size_t t = 0; t != 10; t++)
      {
        /// Some operation to ms;
        ms_gh = ms;
        /// Some operation to ms_gh;
      }
    }

    #0  0x00002aaab2405207 in raise () from /lib64/libc.so.6
    #1  0x00002aaab24068f8 in abort () from /lib64/libc.so.6
    #2  0x00002aaab2447cc7 in __libc_message () from /lib64/libc.so.6
    #3  0x00002aaab2450429 in _int_free () from /lib64/libc.so.6
    #4  0x000000000041bfba in __gnu_cxx::new_allocator<std::_Rb_tree_node<unsigned long> >::deallocate (this=07fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/ext/new_allocator.h:110
    #5  0x000000000041835c in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_put_node (this=0x7fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:374
    #6  0x000000000041276e in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_destroy_node (this=0x7fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:422
    #7  0x000000000040c8ad in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1127
    #8  0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x72f410)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
    #9  0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x72b760)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
    #10 0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x70fce0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
    #11 0x00000000004080c4 in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::~_Rb_tree (this=0x7fffffff8b50, __in_chrg=<optimized ut>)
at /usr/include/c++/4.8.2/bits/stl_tree.h:671
    #12 0x0000000000407bbc in std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> ::~set (this=0x7fffffff8b50, 
__in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_set.h:90
    #13 0x0000000000405003 in Mesh::operator= (this=0x7fffffffa8a0, mesh=...)
at mesh.cpp:73
    #14 0x000000000048eb98 in DynamicMesh::reattach_ghost (mpi_comm=1140850688, 
ms=..., cn=..., ms_gh=..., gh=..., cn_gh=..., ale=..., t=4)
at dynamicMesh.cpp:273

在这种情况下,追溯#13对应于交换std :: set。

我的问题是,为什么这样的错误没有在第一步出现,为什么没有在所有处理器上出现。此外,有时会在与std :: map相关的行中发生此错误。

此外,在我的macOS和Linux笔记本电脑上,代码可以成功运行;但不适用于HPC。

1 个答案:

答案 0 :(得分:1)

太复杂了!步骤1:std::setstd::map都具有clear函数,因此无需与空的临时对象交换:

/* const*/ Mesh& Mesh::operator=(Mesh const& other)
// why return const? 'this' isn't const either;
// if at all, you only prevent using it directly afterwards:
// Mesh x, y;
// (x = y).someNonConstFunction();
{
    //std::set<size_t>().swap(ghostSet);  ///BUG here
    //std::map<size_t, size_t>().swap(localIndex); /// BUG sometimes here

    localIndex.clear();
    for(auto const &it : other.localIndex)
        localIndex[it.first] = it.second;

    ghostSet.clear();
    for(auto const &it : other.ghostSet)
        ghostSet.insert(it);
}

对上面的清除进行重新排序仅是为了更好地说明步骤2:std::mapstd::set已经提供了赋值运算符,它们确实执行清除和复制循环的作用:

Mesh& Mesh::operator=(Mesh const& other)
{
    //localIndex.clear();
    //for(auto const &it : other.localIndex)
    //    localIndex[it.first] = it.second;
    localIndex = other.localIndex;


    //ghostSet.clear();
    //for(auto const &it : other.ghostSet)
    //    ghostSet.insert(it);
    ghostSet = other.ghostSet;

    // now fixing as well:
    return *this;
}

第3步:现在上面的运算符完全按照默认赋值运算符的方式进行操作,仅默认值按声明成员的顺序进行赋值,因此先分配集合,然后分配映射。假设分配顺序无关紧要,您最终将获得:

class Mesh
{
    Mesh& Mesh::operator=(Mesh const& other) = default;
};
  

我正在研究并行代码[...]

请注意,在任何情况下,分配都不是线程安全的(带循环的原始代码也不是)。您的双重删除问题很可能仅是由于同时访问集合或地图导致的。您将必须保护您的地图,使其在操作员仍处于活动状态时(例如, G。通过互斥锁。

您没有两个选择:通过访问互斥量(无论是getter还是获取),使类本身成为线程安全的,但是,通过引用或指针返回任何内容然后变得不安全,因为锁不会吸气剂退出后立即被关押。无论如何,如果按值返回,没问题。

另一个变体是将正确的线程同步留给用户,这避免了上述问题,因为它将在获取引用之前锁定互斥锁,在引用尚未使用时保持互斥锁,然后才释放它。 / p>

上述方法可以通过读/写锁进行改进,其中只有在对象被修改(新的项目如上述那样添加到映射或集合或赋值)中时,才会保留读/写锁。关键是修改单个元素–除非这些元素自己提供互斥锁或类似元素,或者可以原子地修改(或使用一些无锁算法),否则也需要保持写锁定。