Question

我正试图找出一个错误，该错误有时会在此琐碎的C ++类的析构函数中使我的应用程序崩溃：

class CrashClass {

public:
         CrashClass(double r1, double s1, double r2, double s2, double r3, double s3, string dateTime) : mR1(r1), mS1(s1), mR2(r2), mS2(s2), mR3(r3), mS3(s3), mDateTime(dateTime) { }
         CrashClass() : mR1(0), mS1(0), mR2(0), mS2(0), mR3(0), mS3(0) { }
        ~CrashClass() {}

    string  GetDateTime()   { return mDateTime; }

private:
    double mR1, mS1, mR2, mS2, mR3, mS3;
    string mDateTime;
};

一堆这些对象被卡在标准C ++ vector中，并用于第二类：

class MyClass {
    (...)

private:
    vector<CrashClass>    mCrashClassVec;
};

MyClass被创建并根据需要多次取消分配。

代码在macOS 10.14.4下的最新Xcode 10.1上使用C ++ 17。

所有这些都是计算密集型模拟应用程序的一部分，该应用程序运行数小时至数天。在6核i7机器上（使用macOS的GCD框架）并行运行12个计算，这通常会在几个小时后使用

崩溃

释放的指针未分配

在mCrashClassVec.clear()中的成员上调用MyClass时出错，即

frame #0: 0x00007fff769a72f6 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00000001004aa80d libsystem_pthread.dylib`pthread_kill + 284
frame #2: 0x00007fff769116a6 libsystem_c.dylib`abort + 127
frame #3: 0x00007fff76a1f977 libsystem_malloc.dylib`malloc_vreport + 545
frame #4: 0x00007fff76a1f738 libsystem_malloc.dylib`malloc_report + 151
frame #5: 0x0000000100069448 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__libcpp_deallocate(__ptr=<unavailable>) at new:236 [opt]
frame #6: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator<char>::deallocate(__p=<unavailable>) at memory:1796 [opt]
frame #7: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator_traits<std::__1::allocator<char> >::deallocate(__p=<unavailable>) at memory:1555 [opt]
frame #8: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string() at string:1941 [opt]
frame #9: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string() at string:1936 [opt]
frame #10: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] CrashClass::~CrashClass(this=<unavailable>) at CrashClass.h:61 [opt]
frame #11: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] CrashClass::~CrashClass(this=<unavailable>) at CrashClass.h:61 [opt]
frame #12: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator<CrashClass>::destroy(this=<unavailable>, __p=<unavailable>) at memory:1860 [opt]
frame #13: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] void std::__1::allocator_traits<std::__1::allocator<CrashClass> >::__destroy<CrashClass>(__a=<unavailable>, __p=<unavailable>) at memory:1727 [opt]
frame #14: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] void std::__1::allocator_traits<std::__1::allocator<CrashClass> >::destroy<CrashClass>(__a=<unavailable>, __p=<unavailable>) at memory:1595 [opt]
frame #15: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__vector_base<CrashClass, std::__1::allocator<CrashClass> >::__destruct_at_end(this=<unavailable>, __new_last=0x00000001011ad000) at vector:413 [opt]
frame #16: 0x0000000100069429 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__vector_base<CrashClass, std::__1::allocator<CrashClass> >::clear(this=<unavailable>) at vector:356 [opt]
frame #17: 0x0000000100069422 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::vector<CrashClass, std::__1::allocator<CrashClass> >::clear(this=<unavailable>) at vector:749 [opt]

旁注：正在清除的vector可能还没有任何元素。

在堆栈跟踪（bt all中，我可以看到其他线程对其CrashClass向量的副本执行操作，但是据比较堆栈跟踪中的地址可以看到，所有这些线程实际上都是私有副本（按设计），即线程之间不共享任何数据。

自然地，该错误仅在完全生产模式下发生，即所有尝试重现崩溃的尝试

以 DEBUG 模式运行，
在Lldb（Xcode）的 Address Sanitizer 下运行（许多小时/整夜），
在Lldb（Xcode）的 Thread Sanitizer （许多小时/整夜）下运行，
运行类的简化版本，仅保留/复制关键代码，

失败，并且未触发了崩溃。

为什么取消分配在堆栈上分配的简单成员失败，并导致指针未分配错误

？

非常欢迎您提供有关如何调试此错误或触发更强大的bug进行进一步研究的其他提示。

更新5/2019

该错误仍在断断续续地使应用程序崩溃，并且我开始相信我遇到的问题实际上是由最近CPU型号中的英特尔数据损坏错误引起的。

https://mjtsai.com/blog/2019/05/17/microarchitectural-data-sampling-mds-mitigation/

https://mjtsai.com/blog/2017/06/27/bug-in-skylake-and-kaby-lake-hyper-threading/

https://www.tomshardware.com/news/hyperthreading-kaby-lake-skylake-skylake-x,34876.html

Answer 1

您可以尝试一些技巧：

使用单线程运行生产版本更长的时间（例如一周或两周），以查看其是否崩溃。
考虑到可能存在内存碎片的情况，请确保不消耗所有可用的RAM。
请确保您的程序运行时间越长，就不会出现内存泄漏或增加内存使用量。
通过添加额外的值来添加一些跟踪，将值设置为析构函数中已知的值（这样，如果您执行两次删除操作，便可以识别出模式）。
尝试在另一个平台和编译器下运行程序。
您的编译器或库可能包含错误。尝试另一个（较新的）版本。
从原始版本中删除代码，直到不再崩溃为止。如果您能以某种方式破坏内存的序列持续崩溃，那将更好。
一旦崩溃，请使用完全相同的数据（对于每个线程）运行该程序，并查看它是否始终在同一位置崩溃。
重写或验证应用程序中的任何不安全代码。避免强制转换，printf和其他老式变量参数函数以及任何不安全的strcpy和类似函数。
使用选中的STL版本。
尝试未优化的发行版。
尝试优化的调试版本。
了解您的编译器的DEBUG版本与RELEASE版本之间的区别。
从零重写有问题的代码。也许没有错误。
崩溃时检查数据。
查看您的错误/异常处理，以了解是否忽略某些潜在问题。
测试程序在内存不足，磁盘空间不足，引发异常时的行为方式……
确保调试器在处理或不处理每个抛出的异常时停止。
确保程序在没有警告的情况下编译和运行，或者您理解它们并且确定这没有关系。
崩溃时检查数据，看是否美观。
您可以保留内存以减少碎片和重新分配。如果您的程序运行了几个小时，则可能是内存碎片过多，系统找不到足够大的块。
由于您的程序是多线程的，因此请确保您的运行时与此程序也兼容。
确保您不跨线程共享数据，或者确保它们得到了充分的保护。

清理拥有的（！）字符串成员时，析构函数有时会崩溃

1 个答案: