在C#/ C ++-CLI混合应用程序中进行垃圾回收时出现死锁

时间:2019-05-17 20:56:32

标签: c# garbage-collection c++-cli deadlock windbg

在测试C#/ C ++-CLI应用程序时,我遇到了一个应用程序死锁,该死锁似乎可以追溯到垃圾收集器。

该应用程序在Windows 10 LTSC 2019上运行,大多数库的目标框架为4.7.2。僵局间歇性地出现,通常是在通过自动测试连续运行至少8小时之后。

似乎始终因相同的罪魁祸首线程/特征而陷入僵局。具体来说:

  1. 主/ UI线程处于空闲状态。
  2. 后台线程位于为传入的图像数据分配缓冲区空间的中间,这会触发GC(在后面的示例中为线程57)。
  3. 第二个后台线程正在处理来自不同源(传入示例中的线程9)的传入数据。

为了进行诊断,我一直在运行系统,直到重现死锁,然后再获取并分析进程转储文件。

使用windbg!threads,以下三个线程似乎很有趣

   9    9  4e4 000002093a48abd0    2b220 Preemptive  0000000000000000:0000000000000000 000002091eec3210 0     MTA 
  57   54  f3c 000002098a6c7760    27220 Cooperative 0000000000000000:0000000000000000 000002091eec3210 1     STA (GC) 
  71   74 1b50 0000020983f20770     1600 Preemptive  0000000000000000:0000000000000000 000002091eec3210 0     Ukn 

从线程57开始,它具有以下(已修剪)的堆栈跟踪:

 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
04 0000004c`9a5fe280 00007ff8`2330a607 : 00000000`00000000 0000004c`9a5fe3c0 00000000`00000001 00000209`83f20770 : clr!CLREventBase::WaitEx+0x7c
05 0000004c`9a5fe310 00007ff8`2330a580 : 00000000`00000000 00000000`00000000 00000000`00000001 00000000`00000001 : clr!WKS::gc_heap::create_bgc_thread+0x6f
06 0000004c`9a5fe340 00007ff8`233bc70e : 00000000`00000001 0000004c`9a5fe3c0 00000000`00000000 00007ff8`23b2bee8 : clr!WKS::gc_heap::prepare_bgc_thread+0x33
07 0000004c`9a5fe370 00007ff8`232a6019 : 00000209`00000001 00000000`00000000 00000000`00000000 00000000`00000000 : clr!WKS::gc_heap::garbage_collect+0x383
08 0000004c`9a5fe3f0 00007ff8`23309cdd : 00000000`000001ea 00000209`00000002 00000000`00000002 00007ff8`231e503d : clr!WKS::GCHeap::GarbageCollectGeneration+0x10d
09 0000004c`9a5fe450 00007ff8`2325f596 : 00007ff8`23309c50 0000004c`9a5fe530 00000000`00000002 00007ff8`231e521c : clr!WKS::GCHeap::GarbageCollect+0x8d
0a 0000004c`9a5fe4a0 00007ff8`2325f50d : 00007ff8`23246e70 00007ff8`23b24c28 00000209`8a6c7760 ffffffff`fffffffe : clr!GCInterface::GarbageCollectModeAny+0x4e
0b 0000004c`9a5fe4f0 00007ff8`2328d314 : 00000209`00000002 00007ff8`23b2fa80 00000000`00700000 00007ff8`231e4ed5 : clr!GCInterface::AddMemoryPressure+0x9e
0c 0000004c`9a5fe570 00007ff8`2067828f : 00000000`00000800 0000004c`9a5fe5f0 0000004c`9a5fe620 00007ff7`c40405e6 : clr!GCInterface::_AddMemoryPressure+0x35
0d 0000004c`9a5fe5c0 00007ff7`44dd5e54 : 00007ff7`c52cc6e8 00000000`00000800 00000209`24f3f050 00007ff7`c403fb3f : mscorlib_ni!System.GC.AddMemoryPressure(Int64)$##6000E97+0x6f
!clrstack
0000004c9a5fe5f0 00007ff83b0df6f4 [InlinedCallFrame: 0000004c9a5fe5f0] System.GC._AddMemoryPressure(UInt64)
0000004c9a5fe5f0 00007ff82067828f [InlinedCallFrame: 0000004c9a5fe5f0] System.GC._AddMemoryPressure(UInt64)
0000004c9a5fe5c0 00007ff82067828f System.GC.AddMemoryPressure(Int64)
0000004c9a5fe720 00007ff744dd5d2c MyFirstManagedCProject.Image..ctor(MyFirstManagedCProject.Imaging.ImageFrame)
0000004c9a5fea30 00007ff7c4dce65a MyCSharpProject.RunImageThread()

此线程负责为传入的图像收集/分配缓冲区。作为其一部分,用于图像对象的c ++-CLI ctor会增加内存压力。在应用程序死锁的情况下,这会导致垃圾回收。

但是,垃圾收集永远无法超越尝试创建后台垃圾收集线程的范围。对参数的天真理解表明它正在尝试创建线程00000209`83f20770(上面的线程71),但是从未发信号。

然后看线程71,我们有以下内容:

Kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
03 0000004c`9cdfef60 00007ff8`3b049bbc : 00000000`00000000 0000004c`fffffffa 0000004c`8b438000 00007ff8`381fc1d0 : ntdll!RtlpWaitOnCriticalSection+0xd9
04 0000004c`9cdfefd0 00007ff8`3b049ad0 : 0000004c`9cdff080 0000004c`9cdff090 00007ff8`381fc130 0000004c`9cdff0b0 : ntdll!RtlpEnterCriticalSectionContended+0xdc
05 0000004c`9cdff000 00007ff8`3811856e : 0000004c`9cdff0b0 00007ff8`381fc130 00000000`00000000 00000000`000002d8 : ntdll!RtlEnterCriticalSection+0x40
06 0000004c`9cdff030 00007ff8`381184c3 : 00000209`6e8e4aa0 00000000`00000002 0000004c`9cdff370 0000004c`9cdff0b0 : ucrtbase!__crt_seh_guarded_call<void>::operator()<<lambda_aa87e3671a710a21b5dc78c0bdf72e11>,<lambda_92619d2358a28f41a33ba319515a20b9> & __ptr64,<lambda_6992ecaafeb10aed2b74cb1fae11a551> >+0x32
07 0000004c`9cdff060 00007ff8`381183e5 : 00000209`6e8e4aa0 00007ff8`381fbb70 00000000`00000004 00000000`00000004 : ucrtbase!construct_ptd+0xb7
08 0000004c`9cdff0a0 00007ff8`38124031 : 00000000`00000000 00000209`6e8e4aa0 00000209`6e8e4aa0 00000000`7ffe0384 : ucrtbase!construct_ptd_array+0x29
09 0000004c`9cdff0d0 00007ff8`3b088f07 : 00000209`6e8e4aa0 00000000`7ffe0385 00000000`00000002 00000000`7ffe0385 : ucrtbase!DllMainDispatch+0xa9
0a 0000004c`9cdff100 00007ff8`3b086896 : 00000209`1eea3c90 00007ff8`38110000 0000004c`00000002 00007ff8`38a9f3d0 : ntdll!LdrpCallInitRoutine+0x6f
0b 0000004c`9cdff170 00007ff8`3b0b5745 : 00000000`00000000 00000000`00000000 00000000`00000001 00000000`00000000 : ntdll!LdrpInitializeThread+0x15a

该线程似乎正在某个关键部分上等待;特别是00007ff8`381fc1d0。

通过查看过程中关键部分来找出阻止该行为的原因,我们发现:

!locks
CritSec ntdll!LdrpLoaderLock+0 at 00007ff83b19f4f8
WaiterWoken        No
LockCount          0
RecursionCount     1
OwningThread       1b50 //71
EntryCount         0
ContentionCount    2
*** Locked

CritSec ucrtbase!_acrt_stdout_buffer+3d0 at 00007ff8381fc1d0
WaiterWoken        No
LockCount          6
RecursionCount     1
OwningThread       4e4 //9
EntryCount         0
ContentionCount    8
*** Locked

线程71当前持有加载程序锁,但是它似乎正在等待的关键部分是线程4e4(从上面称为AKA线程9)持有的某种IO缓冲区。

检查线程9,我们看到以下内容:

Kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
05 0000004c`901fd830 00007ff8`2329d4c1 : 00000000`00000006 00000209`3a48abd0 00000000`00000003 00007ff8`00000080 : clr!WKS::GCHeap::WaitUntilGCComplete+0x2b**
06 0000004c`901fd860 00007ff8`232a263d : 00000209`3a48abd0 00000000`00000000 00000000`00000000 00000000`000027df : clr!Thread::RareDisablePreemptiveGC+0x180
07 0000004c`901fd8f0 00007ff8`231e4564 : 00000000`00000004 00000000`00000000 00000209`3a48abd0 00000000`00000110 : clr!JIT_RareDisableHelperWorker+0x4d
08 0000004c`901fda40 00007fff`79ea45dc : 00000000`00000004 0000004c`901fdb20 00000000`00000000 00000001`00000000 : clr!JIT_RareDisableHelper+0x14
!clrstack
0000004c901fdaa8 00007ff83b0df6f4 [InlinedCallFrame: 0000004c901fdaa8] .std._Lockit._Lockit_ctor(Int32)
0000004c901fdaa8 00007fff79ea45bd [InlinedCallFrame: 0000004c901fdaa8] .std._Lockit._Lockit_ctor(Int32)
0000004c901fda80 00007fff79ea45bd DomainBoundILStubClass.IL_STUB_PInvoke(Int32)
0000004c901fdb30 00007fff79ea449c .std.use_facet >(std.locale*) [c:\program files (x86)\microsoft visual studio 14.0\vc\include\xlocale @ 559]
0000004c901fdb80 00007fff79ea437d .std.use_facet >(std.locale*)
0000004c901fdc00 00007fff79e9dcd5 .std.basic_filebuf >.open(std.basic_filebuf >*, SByte*, Int32, Int32) [c:\program files (x86)\microsoft visual studio 14.0\vc\include\fstream @ 271]
0000004c901fdc60 00007ff7c47beabb .std.basic_ifstream >.open(std.basic_ifstream >*, SByte*, Int32, Int32) [c:\program files (x86)\microsoft visual studio 14.0\vc\include\fstream @ 896]
0000004c901fdca0 00007ff744d9ca28 .Calculator.run_Algorithm_GetTable_ASCII(Calculator*, Single**)
0000004c901fe330 00007ff744d9bf01 .Calculator.run_Calculate(Calculator*)
0000004c901fe370 00007ff744d9ba38 .perform_Calculation(Int32, Single, Single, Single, Int32, Int32, Int32, Single, Int32, Single, Single, Single, Single, Single, Int32, Int32, SByte*, Int32, Single*, Single*, Single*)
0000004c901fe8f0 00007ff8231e222e [InlinedCallFrame: 0000004c901fe8f0] MySecondManagedCProject.perform_Calculation(Int32, Single, Single, Single, Int32, Int32, Int32, Single, Int32, Single, Single, Single, Single, Single, Int32, Int32, System.String, Int32, Single ByRef, Single ByRef, Single ByRef)
0000004c901fe830 00007ff744d988ec DomainBoundILStubClass.IL_STUB_PInvoke(Int32, Single, Single, Single, Int32, Int32, Int32, Single, Int32, Single, Single, Single, Single, Single, Int32, Int32, System.String, Int32, Single ByRef, Single ByRef, Single ByRef)
0000004c901fec80 00007ff744d954fc MyCSharpProject.Calculate(MyCSharpProject.Settings)
0000004c901fed40 00007ff7c3aaf033 [MulticastFrame: 0000004c901fed40] System.EventHandler.Invoke(System.Object, System.EventArgs)
0000004c901feda0 00007ff744d951b1 MyCSharpProject.OnDataReceived(System.EventArgs)
0000004c901fef50 00007ff7c43e25d5 MyCSharpProject.NotifyThread()

线程9保持锁定线程71的需要,方法是打开文件流以加载配置数据以进行“计算”。但是,线程9直到GC完成才被阻塞,因此永远无法释放文件流似乎占用的关键部分。

目前,我不清楚该采取什么行动。如果我的分析是正确的,那么我的一个c ++项目中的文件IO可能会在GC期间导致死锁-但是考虑到文件I / O是一种常见的操作,这个结论似乎不太可能,我希望这样的问题非常普遍和严重,而不是我的应用程序特有的,很少发生的事情。

我在诊断中缺少明显的东西吗?还是我忽略了调查的途径?

0 个答案:

没有答案