我有一个有趣的问题,我有两个转储显示托管堆损坏的两个进程。我在Windows 7 x64上使用x64中的clr.dll 4.0.30319.1008(RTMGDR.030319-1000)。 使用VerifyHeap我知道我有腐败:
0:016> !VerifyHeap
object 000000000367ec60: bad member 0000000004fba740 at 000000000367ec78
curr_object: 000000000528CF90
Last good object: 000000000367ec40
该对象是一个包含两个元素的数组
0:016> !DumpObj /d 000000000367ec60
Name: System.Object[]
MethodTable: 000007feedf6adf8
EEClass: 000007feedaefc68
Size: 48(0x30) bytes
Array: Rank 1, Number of elements 2, Type CLASS (Print Array)
Element Type:System.Object
Fields:
None
0:016> !DumpArray /d 000000000367ec60
Name: System.Object[]
MethodTable: 000007feedf6adf8
EEClass: 000007feedaefc68
Size: 48(0x30) bytes
Array: Rank 1, Number of elements 2, Type CLASS
Element Methodtable: 000007feedf65a48
[0] 0000000004fba740
[1] 000000000367ec90
第一个指针是损坏的值,它确实指向值为1的bool值,该值不是托管对象。这就是GC拯救的原因。
0:016> db 0000000004fba740-10
00000000`04fba730 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000000`04fba740 **01 00** 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000000`04fba750 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000000`04fba760 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000000`04fba770 00 00 00 00 00 00 00 00-b8 1b f7 ed fe 07 00 00 ................
00000000`04fba780 d0 a7 fb 04 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000000`04fba790 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000000`04fba7a0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0:016> !lno 04fba740
Before: 0000000004fba718 System.Collections.Hashtable+bucket[]
After: 0000000004fba778 System.Collections.Hashtable
Heap local consistency confirmed.
周围的物体并不重要,因为根据转储,它们会随机变化。
!GCRoot 0000000367ec60
Scan Thread 16 OSTHread 5fd0
r10:Root: 000000000367ec60(System.Object[])
Scan Thread 17 OSTHread 10cc
RSP:1de4cd58:Root: 000000000367ec60(System.Object[])
数组本身没有root,表示可以收集它。有趣的是,数组中的第二个对象是来自已经退出的线程的ThreadLocal数据。看起来CLR确实将ThreadLocal对象存储在每个线程的对象数组中,该线程在退出时可以收集。 线程号17执行实际集合,它会抛出ExecutionEngineException。但是线程16似乎确实将线程本地数据保存到一个应该固定的数组(它不是)它应该没有访问权限。
线程nr 16似乎保存已经退出的线程的TLS数据,并且可能写入它。
OS Thread Id: 0x5fd0 (16)
Child SP IP Call Site
000000001dffdfe8 0000000076eb135a [NDirectMethodFrameStandalone: 000000001dffdfe8] MS.Win32.UnsafeNativeMethods.MsgWaitForMultipleObjects(Int32, IntPtr[], Boolean, Int32, Int32)
000000001dffdfa0 000007fecfa7e1bd DomainBoundILStubClass.IL_STUB_PInvoke(Int32, IntPtr[], Boolean, Int32, Int32)*** WARNING: Unable to verify checksum for UIAutomationClientsideProviders.ni.dll
000000001dffe090 000007fecfa7b28d MS.Internal.AutomationProxies.Misc.MsgWaitForMultipleObjects(Microsoft.Win32.SafeHandles.SafeWaitHandle, Boolean, Int32, Int32)
000000001dffe110 000007fecfab5cdd MS.Internal.AutomationProxies.QueueProcessor.WaitForWork()
000000001dffe1b0 000007feede22f78 System.Threading.ExecutionContext.runTryCode(System.Object)*** WARNING: Unable to verify checksum for mscorlib.ni.dll
000000001dffe8d8 000007fef08044c4 [HelperMethodFrame_PROTECTOBJ: 000000001dffe8d8] System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode, CleanupCode, System.Object)
000000001dffea00 000007feede11661 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000001dffea60 000007feede115ab System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
000000001dffeab0 000007feedea6d8d System.Threading.ThreadHelper.ThreadStart()
000000001dffef08 000007fef08044c4 [GCFrame: 000000001dffef08]
000000001dfff2f0 000007fef08044c4 [DebuggerU2MCatchHandlerFrame: 000000001dfff2f0]
这是GC收集的堆栈:
0:017> !DumpStack
OS Thread Id: 0x10cc (17)
Current frame: clr!WKS::gc_heap::mark_object_simple+0x75
Child-SP RetAddr Caller, Callee
000000001de4cce0 000007fef0877fb2 clr!WKS::gc_heap::mark_through_cards_for_segments+0x36b
000000001de4ce50 000007fef0873980 clr!WKS::gc_heap::mark_phase+0x160, calling clr!WKS::gc_heap::mark_through_cards_for_segments
000000001de4ce80 000007fef086fce7 clr!EEJitManager::CleanupCodeHeaps+0x57, calling clr!CrstBase::Leave
000000001de4cea0 000007fef07e3dc1 clr!CrstBase::Leave+0x31, calling clr!GetThread
000000001de4ced0 000007fef0873f3d clr!WKS::gc_heap::gc1+0xae, calling clr!WKS::gc_heap::mark_phase
000000001de4cef0 000007fef0874786 clr!WKS::gc_heap::update_collection_counts+0x16, calling 000000000065006e
000000001de4cf20 000007fef0a1fa56 clr!WKS::gc_heap::garbage_collect+0x42e, calling clr!WKS::gc_heap::gc1
000000001de4cf60 000007feede2d774 (MethodDesc 000007feedaa93b8 +0x124 System.TimeZoneInfo.GetDateTimeNowUtcOffsetFromUtc(System.DateTime, Boolean ByRef)), calling (MethodDesc 000007feedaa8708 +0 System.TimeSpan.Add(System.TimeSpan))
000000001de4cfa0 000007fef07fd4ff clr!SystemNative::__GetSystemTimeAsFileTime+0xf, calling kernel32!GetSystemTimeAsFileTimeStub
000000001de4cff0 000007fef087452e clr!WKS::GCHeap::GarbageCollectGeneration+0x14e, calling clr!WKS::gc_heap::garbage_collect
000000001de4d040 000007fef08734ce clr!WKS::gc_heap::try_allocate_more_space+0x25f, calling clr!WKS::GCHeap::GarbageCollectGeneration
000000001de4d080 000007fef0872f43 clr!WKS::gc_heap::allocate_small+0x158, calling clr!WKS::gc_heap::a_fit_segment_end_p
000000001de4d110 000007fef08731fe clr!FastAllocateObject+0x73e, calling clr!WKS::gc_heap::try_allocate_more_space
000000001de4d1f0 000007fef07fc8b8 clr!JIT_NewFast+0xb8, calling clr!FastAllocateObject
000000001de4d2c8 000007feede3fa80 (MethodDesc 000007feedaaa8e8 +0x40 System.Text.StringBuilder.ExpandByABlock(Int32)), calling clr!JIT_TrialAllocSFastMP_InlineGetThread
0:016> !Threads
ThreadCount: 17
UnstartedThread: 0
BackgroundThread: 13
PendingThread: 0
DeadThread: 1
Hosted Runtime: no
PreEmptive Lock
ID OSID ThreadOBJ State GC GC Alloc Context Domain Count APT Exception
0 1 58e4 0000000000498ba0 2006020 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 STA
2 2 4190 000000000049ee80 b220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA (Finalizer)
6 3 48d4 000000001ac8bb60 1000220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 Ukn (Threadpool Worker)
8 5 5fbc 000000001aca1970 a009220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA (Threadpool Completion Port)
9 6 615c 000000001c4b2880 b020 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA
10 7 5818 000000001c4e7bd0 200b220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA
11 8 6e14 000000001c4f0850 7020 Enabled 0000000000000000:0000000000000000 0000000000481df0 2 STA
12 a 683c 000000001c512610 7220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 STA
14 b 6f40 000000001c521120 7220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 STA
15 c 5070 000000001c564760 100a220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA (Threadpool Worker)
16 d 5fd0 000000000049bc10 b220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA
17 e 10cc 000000001c62e370 b220 Enabled 0000000000000000:0000000000000000 0000000000481df0 2 MTA (GC) System.ExecutionEngineException (0000000002441228)
XXXX f 000000001e102c80 15820 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 Ukn
22 10 158c 000000001e103aa0 1009220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA (Threadpool Worker)
23 12 47e8 000000001e1048c0 8019220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 Ukn (Threadpool Completion Port)
24 4 58a8 000000001e103390 8019220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 Ukn (Threadpool Completion Port)
25 9 2874 000000001e102570 8009220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA (Threadpool Completion Port)
这一切都很有趣但我不确定如何继续进行。由于错误确实发生在自动测试机上,测试控制器进程每天死掉大约1-2次,我不能简单地将调试器连接到进程并设置一些断点来保护写入特定的内存位置。任何额外的提示如何评价这一点是非常受欢迎的。我将获得更多转储,以便能够至少进行差异分析,以检查哪些测试可能导致此问题。
对我而言,看起来确实保留了线程静态的CLR数组是未固定的,有人确实将未装箱的bool值写入第一个数组元素。 CLR数组不包含值,但通常是托管对象的地址,但只有bool值(一个)而不是通常的CLR对象及其对象头。
错误的PInvoke签名会导致此行为吗?我见过像
这样的东西 [DllImport( "kernel32.dll" )]
public static extern bool Beep( int frequeny_in, int time_in );
确实返回一个字节的bool但是Beep方法确实返回一个4字节的bool。 PInvoke(bool而不是int)的错误返回类型是否会导致此类问题?