现在我非常奇怪地遇到了一个错误。
我的应用程序是一个winform客户端,需要连接到带有WCF的服务器。我的应用程序将引用几个.net和c ++模块/ dll。
出于某种原因,我在代码中设置了ThreadPool.SetMaxThreads(150, 200)
。运行几个小时后,此客户端将与服务器断开连接。
使用windbg调试后,我发现线程池中充满了许多奇怪的线程。因此,在线程池中不能创建新线程,我认为WCF也无法创建与导致断开连接的服务器连接的线程。
这些奇怪的线程看起来像这样:
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt
XXXX 3 cb8 0043afd8 1400 Preemptive 00000000:00000000 003f3248 0 Ukn
根据Yun Jin's WebLog "Thread, System.Threading.Thread, and !Threads" series和SSCLI 2.0 source code,生成这些线程的最高概率是CLR在线程池中创建一个新线程,并且该线程永远不会被恢复。
我想知道为什么或如何恢复线程或许多线程失败。
以下是更多技术细节:
当CLR在线程池中创建新线程时,它将调用SetupUnstartedThread
方法和CreateNewThread/CreateNewOSThread
方法。
在SetupUnstartedThread
之后,CLR将创建一个这样的线程
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt
XXXX 3 0 0043afd8 1400 Preemptive 00000000:00000000 003f3248 0 Ukn
哪个状态为0x1400 (TS_Unstarted | TS_WeOwn)
且没有OSID且没有调试器ID(XXXX)
在CreateNewThread/CreateNewOSThread
之后,线程将变为
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt
XXXX 3 cb8 0043afd8 1400 Preemptive 00000000:00000000 003f3248 0 Ukn
哪个有OSID,也没有调试器ID(XXXX)
此外,线程的ExposedObject
字段为空。
但是如果线程成功恢复,这意味着ntdll!LdrInitializeThunk
被调用,则线程将获得调试器ID(2)
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt
2 3 cb8 0043afd8 1400 Preemptive 00000000:00000000 003f3248 0 Ukn
线程的状态与错误状态(没有调试器ID)不同
编辑给Thomas W
如果你提到的选项c是
(c)CLR中的一个特殊OS线程,它可能运行托管代码。
根据SSCLI 2.0 source code,如果操作系统线程想要访问托管代码,CLR将调用运行以下代码的SetupThread
方法
// reset any unstarted bits on the thread object
FastInterlockAnd((ULONG *) &pThread->m_State, ~Thread::TS_Unstarted);
FastInterlockOr((ULONG *) &pThread->m_State, Thread::TS_LegalToJoin);
绝对不是0x1400
任何奇怪的线程都没有~
线程列表中的相应线程。因此,您无法在!runaway
修改2
很抱歉最近更新了这篇文章。尚未找到根本原因,但已找到一种解决方法,即将 .Net Framework 4.0 替换为 .Net Framework 4.5 。
以下内容将介绍有关如何找到变通方法的更多详细信息。
曾几何时我跟踪过这些奇怪线程的整个生命周期。我们都知道CLR中有一个Gate Thread (thread help to monitor status of completion port threads and worker threads, only one)。当我的应用程序开始出错时,Gate Thread会调用clr!ThreadpoolMgr::CreateWorkerThread
周期,这将创建一个新的clr线程对象和一个新的os线程对象。
0:004> k
ChildEBP RetAddr
04c8f6f8 6f3ea8ff KERNEL32!CreateThreadStub
04c8f744 6f3ea77b clr!Thread::CreateNewOSThread+0xba
04c8f78c 6f3eabc1 clr!Thread::CreateNewThread+0xa9
04c8f81c 6f4a6aed clr!ThreadpoolMgr::CreateUnimpersonatedThread+0xbb
04c8f83c 6f4a560e clr!ThreadpoolMgr::CreateWorkerThread+0x19
04c8f864 6f4a4457 clr!ThreadpoolMgr::EnsureEnoughWorkersWorking+0x116
04c8f94c 75973c45 clr!ThreadpoolMgr::GateThreadStart+0x431
04c8f958 771a37f5 KERNEL32!BaseThreadInitThunk+0xe
04c8f998 771a37c8 ntdll!__RtlUserThreadStart+0x70
04c8f9b0 00000000 ntdll!_RtlUserThreadStart+0x1b
新主题看起来像这样
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt
XXXX 3 cb8 0043afd8 1400 Preemptive 00000000:00000000 003f3248 0 Ukn
我猜这个话题可能永远不会被恢复。原来我错了。过了一会儿,这个帖子分别调用了ntdll!LdrInitializeThunk
和ntdll!_RtlUserThreadStart
。
0:065> k
ChildEBP RetAddr
1d54f7c0 75973c45 clr!Thread::intermediateThreadProc
1d54f7cc 771a37f5 KERNEL32!BaseThreadInitThunk+0xe
1d54f80c 771a37c8 ntdll!__RtlUserThreadStart+0x70
1d54f824 00000000 ntdll!_RtlUserThreadStart+0x1b
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt
65 3 cb8 0043afd8 1400 Preemptive 00000000:00000000 003f3248 0 Ukn
在检查clr!Thread::intermediateThreadProc
的参数后,我发现这个帖子会调用clr!ThreadpoolMgr::WorkerThreadStart
。
然后魔术发生了。
clr!ThreadpoolMgr::WorkerThreadStart
结束后,通常{<1}}应该在线程停止之前由终结器线程调用。 但这次没有。
否clr!ThreadStore::RemoveThread
,只是
clr!ThreadStore::RemoveThread
所以相应的os线程已被破坏,但clr线程也存在。
0:065> k
ChildEBP RetAddr
1889fb04 7716f73a ntdll!LdrpCallInitRoutine+0x14
1889fba8 7716f63b ntdll!LdrShutdownThread+0xe6
1889fbb8 75973c4c ntdll!RtlExitUserThread+0x2a
1889fbc4 771a37f5 KERNEL32!BaseThreadInitThunk+0x15
1889fc04 771a37c8 ntdll!__RtlUserThreadStart+0x70
1889fc1c 00000000 ntdll!_RtlUserThreadStart+0x1b
也许你会问为什么线程'状态没有改变。出于某种原因,我当时没有深入探讨 Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt
XXXX 3 cb8 0043afd8 1400 Preemptive 00000000:00000000 003f3248 0 Ukn
。所以我无法给你答案,但我也再次阅读SSCLI 2.0 source code并再次猜测(^ _ ^)。
clr!ThreadpoolMgr::WorkerThreadStart
将调用'clr!SetupThreadPoolThreadNoThrow'。以下是'clr!SetupThreadPoolThreadNoThrow'的代码片段。
clr!ThreadpoolMgr::WorkerThreadStart
请注意“ SwallowAllExceptions ”。然后你可以看到这个方法会调用EX_TRY
{
pThread = SetupThreadPoolThread(typeTPThread);
}
EX_CATCH
{
if (pHR)
{
*pHR = GET_EXCEPTION()->GetHR();
}
}
EX_END_CATCH(SwallowAllExceptions);
。再次显示代码段。
clr!SetupThreadPoolThread
然后我想如果在调用if (NULL == (pThread = GetThread()))
{
pThread = SetupInternalThread();
}
if ((pThread != NULL) && ((pThread->m_State & Thread::TS_ThreadPoolThread) == 0))
{
if (typeTPThread == WorkerThread)
{
FastInterlockOr((ULONG *) &pThread->m_State, Thread::TS_ThreadPoolThread | Thread::TS_TPWorkerThread);
}
else if (typeTPThread == CompletionPortThread)
{
FastInterlockOr ((ULONG *) &pThread->m_State, Thread::TS_ThreadPoolThread | Thread::TS_CompletionPortThread);
}
else
{
FastInterlockOr((ULONG *) &pThread->m_State, Thread::TS_ThreadPoolThread);
}
}
时发生了异常,则线程的状态将无法被更改。
所以这是我第一次认为.net框架中可能存在一个由我的应用程序触发的小缺陷。与此同时,我的一位同事告诉我,他无法重现这个错误。在检查了他的环境后,我发现他使用了 .Net Framework 4.5 。
到目前为止,在升级.net框架后,错误并未再次发生。
答案 0 :(得分:1)
要查看.NET如何创建托管线程并将标记设置为XXX,您可以运行以下代码。在Debug构建中编译应用程序,启动WinDbg并在调试器下运行应用程序。在初始断点处,运行以下命令:
sxe -c ".loadby sos clr;g" ld clr.dll;.ocommand OCOMMAND;g
然后,应用程序将自行调试,您将看到线程发生变化
Step .NET threads Unstarted Dead Thread objects Native threads
1 (before started) 2 0 0 1 4
2 (Thread started) 3 1 (XXX) 0 2 5
3 (Thread running) 3 0 0 3 8
4 (Thread ended) 3 0 1 (XXX) 2 7
5 (GC ran) 3 0 1 (XXX) 2 4
SSCCE代码:
using System;
using System.Diagnostics;
using System.Threading;
namespace ManagedThreadDebug
{
class Program
{
static void Main()
{
InformDebug("Before creating thread object.");
var t = new Thread(ThreadRun);
InformDebug("After creating thread object and calling Start().");
t.Start();
InformDebug("While thread is running.");
t.Join();
InformDebug("After thread was running (GC potentially not run yet).");
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
Thread.Sleep(10);
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
Thread.Sleep(10);
InformDebug("After thread was running (GC hopefully ran).");
}
private static void ThreadRun()
{
Thread.Sleep(1000);
}
private static void InformDebug(string message)
{
Console.WriteLine(message);
Trace.WriteLine("OCOMMAND .echo >>> "+message+";!threads;.echo;!dumpheap -stat -type Thread;.echo;~;g");
}
}
}
几乎完整的输出,为简洁而缩短:
>>> Before creating thread object.
ThreadCount: 2
UnstartedThread: 0
BackgroundThread: 1
PendingThread: 0
DeadThread: 0
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 1074 00441310 2a020 Preemptive 02796F48:00000000 00408378 1 MTA
2 2 1fb8 00411258 2b220 Preemptive 00000000:00000000 00408378 0 MTA (Finalizer)
Statistics:
MT Count TotalSize Class Name
69f02e64 1 52 System.Threading.Thread
. 0 Id: b78.1074 Suspend: 1 Teb: 7efdd000 Unfrozen
1 Id: b78.2194 Suspend: 1 Teb: 7efda000 Unfrozen
2 Id: b78.1fb8 Suspend: 1 Teb: 7efd7000 Unfrozen
3 Id: b78.1500 Suspend: 1 Teb: 7efaf000 Unfrozen
>>> After creating thread object and calling Start().
ThreadCount: 3
UnstartedThread: 1
BackgroundThread: 1
PendingThread: 0
DeadThread: 0
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 1074 00441310 2a020 Preemptive 02797334:00000000 00408378 1 MTA
2 2 1fb8 00411258 2b220 Preemptive 00000000:00000000 00408378 0 MTA (Finalizer)
XXXX 3 0 00474900 1400 Preemptive 00000000:00000000 00408378 0 Ukn
Statistics:
MT Count TotalSize Class Name
69f02e64 2 104 System.Threading.Thread
. 0 Id: b78.1074 Suspend: 1 Teb: 7efdd000 Unfrozen
1 Id: b78.2194 Suspend: 1 Teb: 7efda000 Unfrozen
2 Id: b78.1fb8 Suspend: 1 Teb: 7efd7000 Unfrozen
3 Id: b78.1500 Suspend: 1 Teb: 7efaf000 Unfrozen
4 Id: b78.27d8 Suspend: 1 Teb: 7efac000 Unfrozen
>>> While thread is running.
ThreadCount: 3
UnstartedThread: 0
BackgroundThread: 1
PendingThread: 0
DeadThread: 0
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 1074 00441310 2a020 Preemptive 02797550:00000000 00408378 1 MTA
2 2 1fb8 00411258 2b220 Preemptive 00000000:00000000 00408378 0 MTA (Finalizer)
6 3 1d04 00474900 2b020 Preemptive 00000000:00000000 00408378 1 MTA
Statistics:
MT Count TotalSize Class Name
69f02e64 2 104 System.Threading.Thread
. 0 Id: b78.1074 Suspend: 1 Teb: 7efdd000 Unfrozen
1 Id: b78.2194 Suspend: 1 Teb: 7efda000 Unfrozen
2 Id: b78.1fb8 Suspend: 1 Teb: 7efd7000 Unfrozen
3 Id: b78.1500 Suspend: 1 Teb: 7efaf000 Unfrozen
4 Id: b78.27d8 Suspend: 1 Teb: 7efac000 Unfrozen
5 Id: b78.2478 Suspend: 1 Teb: 7efa9000 Unfrozen
6 Id: b78.1d04 Suspend: 1 Teb: 7efa6000 Unfrozen
7 Id: b78.1fdc Suspend: 1 Teb: 7efa3000 Unfrozen
>>> After thread was running (GC potentially not run yet).
ThreadCount: 3
UnstartedThread: 0
BackgroundThread: 1
PendingThread: 0
DeadThread: 1
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 1074 00441310 2a020 Preemptive 027977FC:00000000 00408378 1 MTA
2 2 1fb8 00411258 2b220 Preemptive 00000000:00000000 00408378 0 MTA (Finalizer)
XXXX 3 0 00474900 39820 Preemptive 00000000:00000000 00408378 0 Ukn
Statistics:
MT Count TotalSize Class Name
69f02e64 2 104 System.Threading.Thread
. 0 Id: b78.1074 Suspend: 1 Teb: 7efdd000 Unfrozen
1 Id: b78.2194 Suspend: 1 Teb: 7efda000 Unfrozen
2 Id: b78.1fb8 Suspend: 1 Teb: 7efd7000 Unfrozen
3 Id: b78.1500 Suspend: 1 Teb: 7efaf000 Unfrozen
4 Id: b78.27d8 Suspend: 1 Teb: 7efac000 Unfrozen
5 Id: b78.2478 Suspend: 1 Teb: 7efa9000 Unfrozen
7 Id: b78.1fdc Suspend: 1 Teb: 7efa3000 Unfrozen
>>> After thread was running (GC hopefully ran).
ThreadCount: 3
UnstartedThread: 0
BackgroundThread: 1
PendingThread: 0
DeadThread: 1
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 1074 00441310 2a020 Preemptive 02797380:00000000 00408378 1 MTA
2 2 1fb8 00411258 2b220 Preemptive 00000000:00000000 00408378 0 MTA (Finalizer)
XXXX 3 0 00474900 39820 Preemptive 00000000:00000000 00408378 0 Ukn
Statistics:
MT Count TotalSize Class Name
69f02e64 2 104 System.Threading.Thread
. 0 Id: b78.1074 Suspend: 1 Teb: 7efdd000 Unfrozen
1 Id: b78.2194 Suspend: 1 Teb: 7efda000 Unfrozen
2 Id: b78.1fb8 Suspend: 1 Teb: 7efd7000 Unfrozen
3 Id: b78.1500 Suspend: 1 Teb: 7efaf000 Unfrozen
显示为XXXX的线程可以是未启动的线程或死线程。您可能不会喜欢这样的答案:除非您向我们展示一些代码,否则无法告知这些线程来自哪里。潜在的候选人:
在WinDbg中运行应用程序,并在线程启动或线程退出时停止。
sxe ct;sxe et
然后看看这发生了什么,特别检查创建线程的代码。如果这不够具体,您也可以在.NET线程方法上尝试断点。