确定COM对象阻塞终结器

时间:2016-12-07 06:01:37

标签: .net multithreading com windbg rpc

我们有一个.NET后台处理应用程序(控制台应用程序),它执行一些任意工作负载(基本上内部用户提供实现“执行”接口的.NET DLL)。然后,后台处理应用程序通过反射加载dll并执行它。

其中一个提供的DLL显然有一个COM对象,但没有正确处理(可能)。

由于处理时间很长,并且我们在主线程上创建了COM对象而没有正确处理,这会导致终结器线程被阻塞,从而导致进程累积大量未释放的句柄(它可以达到数十万)

我已经尝试使用WinDbg调试该进程的转储,我清楚地看到终结器线程在COM调用的主线程上被阻塞。

首先我们看一下终结器线程中的非托管堆栈(我假设尝试与另一个线程通信以完成COM对象):

0:002> kb
 # ChildEBP RetAddr  Args to Child              
00 0372eec0 74a92cc7 00000968 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
01 0372ef34 74a92c02 00000968 ffffffff 00000000 KERNELBASE!WaitForSingleObjectEx+0x99
02 0372ef48 74dd9839 00000968 ffffffff 010846e4 KERNELBASE!WaitForSingleObject+0x12
03 (Inline) -------- -------- -------- -------- combase!MTAThreadWaitForCall+0x43 [d:\blue\com\combase\dcomrem\channelb.cxx @ 5657]
04 0372ef78 74e7d524 01083ef0 0111a198 0372f1e4 combase!MTAThreadDispatchCrossApartmentCall+0x1ed [d:\blue\com\combase\dcomrem\chancont.cxx @ 193]
05 (Inline) -------- -------- -------- -------- combase!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x33a2 [d:\blue\com\combase\dcomrem\channelb.cxx @ 5052]
06 0372f0cc 74d5caea 010846e4 0372f1e4 0372f1ac combase!CRpcChannelBuffer::SendReceive2+0x62d [d:\blue\com\combase\dcomrem\channelb.cxx @ 4796]
07 (Inline) -------- -------- -------- -------- combase!ClientCallRetryContext::SendReceiveWithRetry+0x2e [d:\blue\com\combase\dcomrem\callctrl.cxx @ 1090]
08 (Inline) -------- -------- -------- -------- combase!CAptRpcChnl::SendReceiveInRetryContext+0x81 [d:\blue\com\combase\dcomrem\callctrl.cxx @ 715]
09 0372f12c 74da1789 010846e4 0372f1e4 0372f1ac combase!DefaultSendReceive+0x9e [d:\blue\com\combase\dcomrem\callctrl.cxx @ 671]
0a (Inline) -------- -------- -------- -------- combase!CAptRpcChnl::SendReceive+0x38 [d:\blue\com\combase\dcomrem\callctrl.cxx @ 752]
0b 0372f190 74e7a010 010846e4 0372f1e4 0372f1ac combase!CCtxComChnl::SendReceive+0x248 [d:\blue\com\combase\dcomrem\ctxchnl.cxx @ 735]
0c 0372f1b4 749f5769 0732ade4 0372f210 749f5740 combase!NdrExtpProxySendReceive+0x5c [d:\blue\com\combase\ndr\ndrole\proxy.cxx @ 2017]
0d 0372f1cc 74a76c1b 3037cf71 0732ade4 0372f630 rpcrt4!NdrpProxySendReceive+0x29
0e 0372f5f8 74e79e1e 74d4d8f8 74d52ff0 0372f630 rpcrt4!NdrClientCall2+0x22b
0f 0372f618 74d5c46f 0372f630 00000008 0372f6b8 combase!ObjectStublessClient+0x6c [d:\blue\com\combase\ndr\ndrole\i386\stblsclt.cxx @ 215]
10 0372f628 74da1c07 0732ade4 0372f668 01075c10 combase!ObjectStubless+0xf [d:\blue\com\combase\ndr\ndrole\i386\stubless.asm @ 171]
11 0372f6b8 74da3024 01075c00 74296f1e 0372f77c combase!CObjectContext::InternalContextCallback+0x1e4 [d:\blue\com\combase\dcomrem\context.cxx @ 4428]
12 0372f70c 74296db9 01075c10 74296f1e 0372f77c combase!CObjectContext::ContextCallback+0xbc [d:\blue\com\combase\dcomrem\context.cxx @ 4332]
13 0372f804 74296e8a 74296771 0372f894 30ec4c53 clr!CtxEntry::EnterContext+0x243
14 0372f83c 74296ecb 74296771 0372f894 00000000 clr!RCW::EnterContext+0x3a
15 0372f858 74296ca8 01075cb8 30ec4c07 747f62e8 clr!RCWCleanupList::ReleaseRCWListInCorrectCtx+0xc0
16 0372f8b4 74296453 30ec4d6b 747f5660 00000000 clr!RCWCleanupList::CleanupAllWrappers+0x14d
17 0372f904 742966d8 0103d6d8 00000001 30ec4cb7 clr!SyncBlockCache::CleanupSyncBlocks+0xd0
18 0372f91c 74294a12 30ec4d47 0372fa38 0372fab0 clr!Thread::DoExtraWorkForFinalizer+0x7f
19 0372f964 741ad6b7 0372fab0 0104dbd0 0372fab0 clr!WKS::GCHeap::FinalizerThreadWorker+0x232
1a 0372f978 741ad721 30ec4e73 0372fab0 00000000 clr!Thread::DoExtraWorkForFinalizer+0x1bd
1b 0372fa1c 741ad7ee 30ec4e17 00000000 00000040 clr!Thread::DoExtraWorkForFinalizer+0x240
1c 0372fa78 742bac66 30ec4e9b 742bab00 00000000 clr!Thread::DoExtraWorkForFinalizer+0x5f0
1d 0372faf4 742a5ffd 00000000 00000000 00000000 clr!WKS::GCHeap::FinalizerThreadStart+0x1a0
1e 0372fb94 75127c04 01036258 75127be0 31b17300 clr!Thread::intermediateThreadProc+0x4d
1f 0372fba8 7723ab8f 01036258 33bfcebd 00000000 kernel32!BaseThreadInitThunk+0x24
20 0372fbf0 7723ab5a ffffffff 7721ffcf 00000000 ntdll!__RtlUserThreadStart+0x2f
21 0372fc00 00000000 742a5fb4 01036258 00000000 ntdll!_RtlUserThreadStart+0x1b

我们现在尝试追踪电话(01083ef0):

0:002> dc 010846e4
010846e4  74d4d400 74d61e2c 74d5a65c adb1682c  ...t,..t\..t,h..
010846f4  429b3908 2b59b182 e6a936e1 00000003  .9.B..Y+.6......
01084704  0000002a 04931d40 00000000 01083ef0  *...@........>..
01084714  010849d8 0732cbc0 010910c0 74d4b2e0  .I....2........t
01084724  00070005 00000003 0009b798 00000000  ................
01084734  00000000 0108478c f1eef1ee 00650054  .....G......T.e.
01084744  00680063 00470020 006f006c 00610062  c.h. .G.l.o.b.a.
01084754  002f006c 00610043 00670072 0057006f  l./.C.a.r.g.o.W.

现在我们可以确定进程ID(与我们在自己的进程中相同)和线程ID 0009b798:

0:002> dc 01083ef0
01083ef0  01083f88 74e8ddd0 00008b8c 0009b798  .?.....t........
01083f00  c502aa57 72322f52 5c326394 bf12fc0f  W...R/2r.c2\....
01083f10  5c326394 bf12fc0f 00007400 ffff8b8c  .c2\.....t......
01083f20  7579b889 1e59d93e 04000203 01f31fa0  ..yu>.Y.........
01083f30  010794d0 00000000 00000000 00000000  ................
01083f40  00000001 ffffffff 0102c3b0 0732ade4  ..............2.
01083f50  00000004 00000000 00000000 00000000  ................
01083f60  0067006e 00070005 00000000 00000000  n.g.............

看看线程,瞧,终结者正在调用主线程(0009b798)

0:002> !threads
Index   TID     TEB     StackBase   StackLimit  DeAlloc StackSize   ThreadProc
0   0009b798    0xff03e000  0x00cf0000  0x00ce5000  0x00bf0000  0x0000b000  0x00000000: 
1   000c3e34    0xff03b000  0x015f0000  0x015ef000  0x014f0000  0x00001000  0x00000000: 
2   0005c870    0xff035000  0x03730000  0x0372e000  0x03630000  0x00002000  0x00000000: 
3   0006518c    0xfef06000  0x04a30000  0x04a2e000  0x049f0000  0x00002000  0x00000000: 
4   0006befc    0xfef03000  0x04bf0000  0x04bee000  0x04bb0000  0x00002000  0x00000000: 
5   00080128    0xfef00000  0x04d30000  0x04d2d000  0x04c30000  0x00003000  0x00000000: 
6   000b3ce0    0xfeef4000  0x06310000  0x0630e000  0x06210000  0x00002000  0x7036df4d: System_Data!SNIAsyncWait
7   00056c18    0xfee8f000  0x075a0000  0x0759f000  0x074a0000  0x00001000  0x00000000: 
8   00054374    0xfeef7000  0x0f690000  0x0f68f000  0x0f590000  0x00001000  0x00000000: 
9   0002c1f0    0xfee3e000  0x0c9f0000  0x0c9ed000  0x0c8f0000  0x00003000  0x00000000: 
10  0009ddb8    0xfee80000  0x07ac0000  0x07abe000  0x079c0000  0x00002000  0x00000000: 

在我的WinDbg会话中,我可以在Finalizer callstack上的堆栈帧之间切换,这是一个观察SendReceive2的示例(尽管我可以这样做,但我不知道在哪里寻找或寻找什么):

0:002> dx Debugger.Sessions[0].Processes[35724].Threads[378992].Stack.Frames[6].SwitchTo();dv /t /v
Debugger.Sessions[0].Processes[35724].Threads[378992].Stack.Frames[6].SwitchTo()
@edi              class CRpcChannelBuffer * this = 0x010846e4
0372effc          struct tagRPCOLEMESSAGE * pMessage = 0x0372f1e4
0372f004          unsigned long * pstatus = 0x0372f1ac
0372efb4          struct tagRPC_ERROR_ENUM_HANDLE EnumHandle = struct tagRPC_ERROR_ENUM_HANDLE
0372f008          int fLogEventIsActive = 0n0
<unavailable>     bool bSuccessBeforeClientNotify = <value unavailable>
0372f0a4          struct _GUID iid = {00000134-0000-0000-C000-000000000046}
<unavailable>     HRESULT result = <value unavailable>
@ebx              class CMessageCall * pCall = 0x0111a198
0372f0b4          unsigned long [5] RpcValues = unsigned long [5]
<unavailable>     bool fInstrumentPrematureStubRundown = <value unavailable>
0372f014          struct tagRPC_EXTENDED_ERROR_INFO ErrorInfo = struct tagRPC_EXTENDED_ERROR_INFO
<unavailable>     unsigned long cMax = <value unavailable>
<unavailable>     union WireThat * pOutb = <value unavailable>
0372efcc          int bEnumRpcErrorInfo = 0n0
0372efdc          HRESULT resultSave = 0x00000000
<unavailable>     struct __MIDL_XmitDefs_0003 * pLocalThat = <value unavailable>
<unavailable>     char * stub_data = <value unavailable>
0372efd4          int bSaveResult = 0n0
<unavailable>     long status = <value unavailable>
<unavailable>     struct ISystemDebugEventFire * g_pEventFire = <value unavailable>
<unavailable>     union WPP_PROJECT_CONTROL_BLOCK * WPP_GLOBAL_Control = <value unavailable>

我有两个问题:

第一个:

我想以某种方式确定哪个 COM对象导致此问题,以便我可以建议我们的内部用户在他们的代码中解决问题(如果它不在我们不知道的实用程序DLL中;有源)。我尝试使用终结器堆栈中存在的一些“combase”方法,但是我并不熟悉它,因此我不知道哪个方法可能包含目标COM对象的CLSID。任何来自社区的建议都表示赞赏。

第二个:

鉴于我们的后台处理应用程序可能会受到此类不当行为的影响,有没有办法可以更改它,以便即使主线程执行长时间运行的操作(在此期间以及创建可怕的COM对象),我们仍然可以完成这些COM对象(以及其他对象以及必要的对象)。

P.S。 我在这个上有一个开放的WinDbg,所以可以执行任何必要的命令。

UPDATE1:

根据我追踪RCW的建议,但我不清楚如何进一步:

0:002> !dumpheap -type System.__ComObject
 Address       MT     Size
01a7f348 73386b9c       16     

Statistics:
      MT    Count    TotalSize Class Name
73386b9c        1           16 System.__ComObject
Total 1 objects
0:002> !dumprcw 01a7f348
Managed object:             00000000
Creating thread:            01a7f4b4
IUnknown pointer:           01a7f480
COM Context:                00000000
Managed ref count:          0
IUnknown V-table pointer :  01a7f464 (captured at RCW creation time)
Flags:                      IsDisconnected 
COM interface pointers:
      IP  Context       MT Type
72f6347c 00000000 08000011 System.__ComObject
7326fa50 00000000 73311c2c System.__ComObject
7337a44c 00000000 00010000 System.__ComObject

此外:

0:002> $$>a<C:\IgorM\HandleLeak\COM_FROM_RCW_PUBLIC.txt 01a7f348

 Make sure you are using Symbols and PSSCOR or SOS is loaded…

Closing open log file C:\IgorM\HandleLeak\CommandLog_2_continue.log
Opened log file 'TEMP.LOG'
Index         SyncBlock MonitorHeld Recursion Owning Thread Info          SyncBlock Owner
   16 0110e240            0         0 00000000     none    01a7f348 System.__ComObject
-----------------------------
Total           743
CCW             0
RCW             1
ComClassFactory 0
Free            0
Closing open log file TEMP.LOG
SyncBlock address = 0110e240

This is the COM object referenced by the RCW object from address 0110e240:

741f6114  74252315 clr!AssemblySecurityDescriptor::`vector deleting destructor'
741f6118  741bf3f9 clr!SecurityDescriptorBase<IApplicationSecurityDescriptor>::IsFullyTrusted
741f611c  741d3b9f clr!SecurityDescriptorBase<IApplicationSecurityDescriptor>::CanCallUnmanagedCode
741f6120  742e258e clr!ZapVirtualSection::GetSize
741f6124  741f5b78 clr!SecurityDescriptorBase<IApplicationSecurityDescriptor>::GetDomain
741f6128  741d71b0 clr!AssemblySecurityDescriptor::Resolve
741f612c  741d71ac clr!SecurityDescriptorBase<IApplicationSecurityDescriptor>::IsResolved
741f6130  742f922a clr!AssemblySecurityDescriptor::GetEvidence
741f6134  742d6458 clr!fusion::logging::BindingScope::GetAssemblyName
741f6138  74317958 clr!SecurityDescriptorBase<IApplicationSecurityDescriptor>::SetEvidence
741f613c  741bfefc clr!SecurityDescriptorBase<IApplicationSecurityDescriptor>::GetGrantedPermissionSet
741f6140  741d71d4 clr!AssemblySecurityDescriptor::GetSharedSecDesc
741f6144  742059db clr!AssemblySecurityDescriptor::CanAssert
741f6148  74588b41 clr!AssemblySecurityDescriptor::HasUnrestrictedUIPermission
741f614c  74588b4a clr!AssemblySecurityDescriptor::IsAllCritical
741f6150  74588b6c clr!AssemblySecurityDescriptor::IsAllSafeCritical

0 个答案:

没有答案