什么会导致这么多未启动的线程?

时间:2014-07-10 02:33:26

标签: .net multithreading windbg

现在我非常奇怪地遇到了一个错误。

我的应用程序是一个winform客户端,需要连接到带有WCF的服务器。我的应用程序将引用几个.net和c ++模块/ dll。

出于某种原因,我在代码中设置了ThreadPool.SetMaxThreads(150, 200)。运行几个小时后,此客户端将与服务器断开连接。

使用windbg调试后,我发现线程池中充满了许多奇怪的线程。因此,在线程池中不能创建新线程,我认为WCF也无法创建与导致断开连接的服务器连接的线程。

这些奇怪的线程看起来像这样:

                                                                         Lock  
      ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt
XXXX   3  cb8 0043afd8      1400 Preemptive  00000000:00000000 003f3248 0     Ukn 

根据Yun Jin's WebLog "Thread, System.Threading.Thread, and !Threads" seriesSSCLI 2.0 source code,生成这些线程的最高概率是CLR在线程池中创建一个新线程,并且该线程永远不会被恢复。

我想知道为什么或如何恢复线程或许多线程失败

以下是更多技术细节:

当CLR在线程池中创建新线程时,它将调用SetupUnstartedThread方法和CreateNewThread/CreateNewOSThread方法。

SetupUnstartedThread之后,CLR将创建一个这样的线程

                                                                         Lock  
      ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt
XXXX   3    0 0043afd8      1400 Preemptive  00000000:00000000 003f3248 0     Ukn 

哪个状态为0x1400 (TS_Unstarted | TS_WeOwn)且没有OSID且没有调试器ID(XXXX)

CreateNewThread/CreateNewOSThread之后,线程将变为

                                                                         Lock  
      ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt
XXXX   3  cb8 0043afd8      1400 Preemptive  00000000:00000000 003f3248 0     Ukn 

哪个有OSID,也没有调试器ID(XXXX)

此外,线程的ExposedObject字段为空。

但是如果线程成功恢复,这意味着ntdll!LdrInitializeThunk被调用,则线程将获得调试器ID(2)

                                                                         Lock  
   ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt
2   3  cb8 0043afd8      1400 Preemptive  00000000:00000000 003f3248 0     Ukn 

线程的状态与错误状态(没有调试器ID)不同

编辑给Thomas W

如果你提到的选项c是

  

(c)CLR中的一个特殊OS线程,它可能运行托管代码。

根据SSCLI 2.0 source code,如果操作系统线程想要访问托管代码,CLR将调用运行以下代码的SetupThread方法

// reset any unstarted bits on the thread object
FastInterlockAnd((ULONG *) &pThread->m_State, ~Thread::TS_Unstarted);
FastInterlockOr((ULONG *) &pThread->m_State, Thread::TS_LegalToJoin);

绝对不是0x1400

任何奇怪的线程都没有~线程列表中的相应线程。因此,您无法在!runaway

中看到它们

修改2

很抱歉最近更新了这篇文章。尚未找到根本原因,但已找到一种解决方法,即将 .Net Framework 4.0 替换为 .Net Framework 4.5

以下内容将介绍有关如何找到变通方法的更多详细信息。

曾几何时我跟踪过这些奇怪线程的整个生命周期。我们都知道CLR中有一个Gate Thread (thread help to monitor status of completion port threads and worker threads, only one)。当我的应用程序开始出错时,Gate Thread会调用clr!ThreadpoolMgr::CreateWorkerThread周期,这将创建一个新的clr线程对象和一个新的os线程对象。

0:004> k
ChildEBP RetAddr  
04c8f6f8 6f3ea8ff KERNEL32!CreateThreadStub
04c8f744 6f3ea77b clr!Thread::CreateNewOSThread+0xba
04c8f78c 6f3eabc1 clr!Thread::CreateNewThread+0xa9
04c8f81c 6f4a6aed clr!ThreadpoolMgr::CreateUnimpersonatedThread+0xbb
04c8f83c 6f4a560e clr!ThreadpoolMgr::CreateWorkerThread+0x19
04c8f864 6f4a4457 clr!ThreadpoolMgr::EnsureEnoughWorkersWorking+0x116
04c8f94c 75973c45 clr!ThreadpoolMgr::GateThreadStart+0x431
04c8f958 771a37f5 KERNEL32!BaseThreadInitThunk+0xe
04c8f998 771a37c8 ntdll!__RtlUserThreadStart+0x70
04c8f9b0 00000000 ntdll!_RtlUserThreadStart+0x1b

新主题看起来像这样

                                                                         Lock  
      ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt
XXXX   3  cb8 0043afd8      1400 Preemptive  00000000:00000000 003f3248 0     Ukn 

我猜这个话题可能永远不会被恢复。原来我错了。过了一会儿,这个帖子分别调用了ntdll!LdrInitializeThunkntdll!_RtlUserThreadStart

0:065> k
ChildEBP RetAddr  
1d54f7c0 75973c45 clr!Thread::intermediateThreadProc
1d54f7cc 771a37f5 KERNEL32!BaseThreadInitThunk+0xe
1d54f80c 771a37c8 ntdll!__RtlUserThreadStart+0x70
1d54f824 00000000 ntdll!_RtlUserThreadStart+0x1b
                                                                         Lock  
      ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt
  65   3  cb8 0043afd8      1400 Preemptive  00000000:00000000 003f3248 0     Ukn 

在检查clr!Thread::intermediateThreadProc的参数后,我发现这个帖子会调用clr!ThreadpoolMgr::WorkerThreadStart

然后魔术发生了。

clr!ThreadpoolMgr::WorkerThreadStart结束后,通常{<1}}应该在线程停止之前由终结器线程调用。 但这次没有。

clr!ThreadStore::RemoveThread,只是

clr!ThreadStore::RemoveThread

所以相应的os线程已被破坏,但clr线程也存在。

0:065> k
ChildEBP RetAddr  
1889fb04 7716f73a ntdll!LdrpCallInitRoutine+0x14
1889fba8 7716f63b ntdll!LdrShutdownThread+0xe6
1889fbb8 75973c4c ntdll!RtlExitUserThread+0x2a
1889fbc4 771a37f5 KERNEL32!BaseThreadInitThunk+0x15
1889fc04 771a37c8 ntdll!__RtlUserThreadStart+0x70
1889fc1c 00000000 ntdll!_RtlUserThreadStart+0x1b

也许你会问为什么线程'状态没有改变。出于某种原因,我当时没有深入探讨 Lock ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt XXXX 3 cb8 0043afd8 1400 Preemptive 00000000:00000000 003f3248 0 Ukn 。所以我无法给你答案,但我也再次阅读SSCLI 2.0 source code并再次猜测(^ _ ^)。

clr!ThreadpoolMgr::WorkerThreadStart将调用'clr!SetupThreadPoolThreadNoThrow'。以下是'clr!SetupThreadPoolThreadNoThrow'的代码片段。

clr!ThreadpoolMgr::WorkerThreadStart

请注意“ SwallowAllExceptions ”。然后你可以看到这个方法会调用EX_TRY { pThread = SetupThreadPoolThread(typeTPThread); } EX_CATCH { if (pHR) { *pHR = GET_EXCEPTION()->GetHR(); } } EX_END_CATCH(SwallowAllExceptions); 。再次显示代码段。

clr!SetupThreadPoolThread

然后我想如果在调用if (NULL == (pThread = GetThread())) { pThread = SetupInternalThread(); } if ((pThread != NULL) && ((pThread->m_State & Thread::TS_ThreadPoolThread) == 0)) { if (typeTPThread == WorkerThread) { FastInterlockOr((ULONG *) &pThread->m_State, Thread::TS_ThreadPoolThread | Thread::TS_TPWorkerThread); } else if (typeTPThread == CompletionPortThread) { FastInterlockOr ((ULONG *) &pThread->m_State, Thread::TS_ThreadPoolThread | Thread::TS_CompletionPortThread); } else { FastInterlockOr((ULONG *) &pThread->m_State, Thread::TS_ThreadPoolThread); } } 时发生了异常,则线程的状态将无法被更改。

所以这是我第一次认为.net框架中可能存在一个由我的应用程序触发的小缺陷。与此同时,我的一位同事告诉我,他无法重现这个错误。在检查了他的环境后,我发现他使用了 .Net Framework 4.5

到目前为止,在升级.net框架后,错误并未再次发生。

1 个答案:

答案 0 :(得分:1)

用于分析线程的SSCCE

要查看.NET如何创建托管线程并将标记设置为XXX,您可以运行以下代码。在Debug构建中编译应用程序,启动WinDbg并在调试器下运行应用程序。在初始断点处,运行以下命令:

sxe -c ".loadby sos clr;g" ld clr.dll;.ocommand OCOMMAND;g

然后,应用程序将自行调试,您将看到线程发生变化

Step                .NET threads  Unstarted  Dead     Thread objects  Native threads
1 (before started)  2             0          0        1               4
2 (Thread started)  3             1 (XXX)    0        2               5
3 (Thread running)  3             0          0        3               8
4 (Thread ended)    3             0          1 (XXX)  2               7
5 (GC ran)          3             0          1 (XXX)  2               4

SSCCE代码:

using System;
using System.Diagnostics;
using System.Threading;

namespace ManagedThreadDebug
{
    class Program
    {
        static void Main()
        {
            InformDebug("Before creating thread object.");

            var t = new Thread(ThreadRun);
            InformDebug("After creating thread object and calling Start().");

            t.Start();
            InformDebug("While thread is running.");

            t.Join();
            InformDebug("After thread was running (GC potentially not run yet).");

            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            Thread.Sleep(10);
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            Thread.Sleep(10);
            InformDebug("After thread was running (GC hopefully ran).");
        }

        private static void ThreadRun()
        {
            Thread.Sleep(1000);
        }

        private static void InformDebug(string message)
        {
            Console.WriteLine(message);
            Trace.WriteLine("OCOMMAND .echo >>> "+message+";!threads;.echo;!dumpheap -stat -type Thread;.echo;~;g");
        }
    }
}

几乎完整的输出,为简洁而缩短:

>>> Before creating thread object.
ThreadCount:      2
UnstartedThread:  0
BackgroundThread: 1
PendingThread:    0
DeadThread:       0
                                                                         Lock  
       ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt Exception
   0    1 1074 00441310     2a020 Preemptive  02796F48:00000000 00408378 1     MTA 
   2    2 1fb8 00411258     2b220 Preemptive  00000000:00000000 00408378 0     MTA (Finalizer) 

Statistics:
      MT    Count    TotalSize Class Name
69f02e64        1           52 System.Threading.Thread

.  0  Id: b78.1074 Suspend: 1 Teb: 7efdd000 Unfrozen
   1  Id: b78.2194 Suspend: 1 Teb: 7efda000 Unfrozen
   2  Id: b78.1fb8 Suspend: 1 Teb: 7efd7000 Unfrozen
   3  Id: b78.1500 Suspend: 1 Teb: 7efaf000 Unfrozen

>>> After creating thread object and calling Start().
ThreadCount:      3
UnstartedThread:  1
BackgroundThread: 1
PendingThread:    0
DeadThread:       0
                                                                         Lock  
       ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt Exception
   0    1 1074 00441310     2a020 Preemptive  02797334:00000000 00408378 1     MTA 
   2    2 1fb8 00411258     2b220 Preemptive  00000000:00000000 00408378 0     MTA (Finalizer) 
XXXX    3    0 00474900      1400 Preemptive  00000000:00000000 00408378 0     Ukn 

Statistics:
      MT    Count    TotalSize Class Name
69f02e64        2          104 System.Threading.Thread

.  0  Id: b78.1074 Suspend: 1 Teb: 7efdd000 Unfrozen
   1  Id: b78.2194 Suspend: 1 Teb: 7efda000 Unfrozen
   2  Id: b78.1fb8 Suspend: 1 Teb: 7efd7000 Unfrozen
   3  Id: b78.1500 Suspend: 1 Teb: 7efaf000 Unfrozen
   4  Id: b78.27d8 Suspend: 1 Teb: 7efac000 Unfrozen

>>> While thread is running.
ThreadCount:      3
UnstartedThread:  0
BackgroundThread: 1
PendingThread:    0
DeadThread:       0
                                                                         Lock  
       ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt Exception
   0    1 1074 00441310     2a020 Preemptive  02797550:00000000 00408378 1     MTA 
   2    2 1fb8 00411258     2b220 Preemptive  00000000:00000000 00408378 0     MTA (Finalizer) 
   6    3 1d04 00474900     2b020 Preemptive  00000000:00000000 00408378 1     MTA 

Statistics:
      MT    Count    TotalSize Class Name
69f02e64        2          104 System.Threading.Thread

.  0  Id: b78.1074 Suspend: 1 Teb: 7efdd000 Unfrozen
   1  Id: b78.2194 Suspend: 1 Teb: 7efda000 Unfrozen
   2  Id: b78.1fb8 Suspend: 1 Teb: 7efd7000 Unfrozen
   3  Id: b78.1500 Suspend: 1 Teb: 7efaf000 Unfrozen
   4  Id: b78.27d8 Suspend: 1 Teb: 7efac000 Unfrozen
   5  Id: b78.2478 Suspend: 1 Teb: 7efa9000 Unfrozen
   6  Id: b78.1d04 Suspend: 1 Teb: 7efa6000 Unfrozen
   7  Id: b78.1fdc Suspend: 1 Teb: 7efa3000 Unfrozen

 >>> After thread was running (GC potentially not run yet).
ThreadCount:      3
UnstartedThread:  0
BackgroundThread: 1
PendingThread:    0
DeadThread:       1
                                                                         Lock  
       ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt Exception
   0    1 1074 00441310     2a020 Preemptive  027977FC:00000000 00408378 1     MTA 
   2    2 1fb8 00411258     2b220 Preemptive  00000000:00000000 00408378 0     MTA (Finalizer) 
XXXX    3    0 00474900     39820 Preemptive  00000000:00000000 00408378 0     Ukn 

Statistics:
      MT    Count    TotalSize Class Name
69f02e64        2          104 System.Threading.Thread

.  0  Id: b78.1074 Suspend: 1 Teb: 7efdd000 Unfrozen
   1  Id: b78.2194 Suspend: 1 Teb: 7efda000 Unfrozen
   2  Id: b78.1fb8 Suspend: 1 Teb: 7efd7000 Unfrozen
   3  Id: b78.1500 Suspend: 1 Teb: 7efaf000 Unfrozen
   4  Id: b78.27d8 Suspend: 1 Teb: 7efac000 Unfrozen
   5  Id: b78.2478 Suspend: 1 Teb: 7efa9000 Unfrozen
   7  Id: b78.1fdc Suspend: 1 Teb: 7efa3000 Unfrozen

>>> After thread was running (GC hopefully ran).
ThreadCount:      3
UnstartedThread:  0
BackgroundThread: 1
PendingThread:    0
DeadThread:       1
                                                                         Lock  
       ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt Exception
   0    1 1074 00441310     2a020 Preemptive  02797380:00000000 00408378 1     MTA 
   2    2 1fb8 00411258     2b220 Preemptive  00000000:00000000 00408378 0     MTA (Finalizer) 
XXXX    3    0 00474900     39820 Preemptive  00000000:00000000 00408378 0     Ukn 

Statistics:
      MT    Count    TotalSize Class Name
69f02e64        2          104 System.Threading.Thread

.  0  Id: b78.1074 Suspend: 1 Teb: 7efdd000 Unfrozen
   1  Id: b78.2194 Suspend: 1 Teb: 7efda000 Unfrozen
   2  Id: b78.1fb8 Suspend: 1 Teb: 7efd7000 Unfrozen
   3  Id: b78.1500 Suspend: 1 Teb: 7efaf000 Unfrozen

结论

显示为XXXX的线程可以是未启动的线程或死线程。您可能不会喜欢这样的答案:除非您向我们展示一些代码,否则无法告知这些线程来自哪里。潜在的候选人:

  • 代码中的新Thread()语句
  • 使用Parallel.For和类似的
  • 使用ThreadPool
  • 第三方库中的代码

调试线程启动并退出

在WinDbg中运行应用程序,并在线程启动或线程退出时停止。

sxe ct;sxe et

然后看看这发生了什么,特别检查创建线程的代码。如果这不够具体,您也可以在.NET线程方法上尝试断点。