WCF应用程序挂起(附带WinDBG输出)

时间:2013-08-20 13:39:46

标签: c# wcf windbg hang

我们遇到了WCF服务的问题,我们无法重现。该服务有时不响应客户的呼叫。经过一段时间的不活动后,这种情况经常发生在星期一。

WCF服务是在Windows服务中自托管的。实例上下文是每次调用。它使用NetTcpBinding而没有安全性,整个WCF配置是在代码中完成的,没有XML配置。对于会话,调用和实例,我们已将ServiceThrottle参数设置为1024。以下是完整的ServiceHost配置:



    ServiceThrottlingBehavior throttle;
    throttle = _svcHost.Description.Behaviors.Find<ServiceThrottlingBehavior>();
    if (throttle == null)
    {
        throttle = new ServiceThrottlingBehavior();
        throttle.MaxConcurrentCalls = 1024;
        throttle.MaxConcurrentSessions = 1024;
        throttle.MaxConcurrentInstances = 1024;
        _svcHost.Description.Behaviors.Add(throttle);
    }

    ...

    TimeSpan timeout = new TimeSpan(0, 0, 5);

    NetTcpBinding binding = new NetTcpBinding(SecurityMode.None);
    binding.OpenTimeout = timeout;
    binding.CloseTimeout = timeout;
    binding.ReceiveTimeout = timeout;
    binding.SendTimeout = timeout;
    binding.MaxBufferSize = 10485760;
    binding.MaxReceivedMessageSize = 10485760;

    XmlDictionaryReaderQuotas quotas = new XmlDictionaryReaderQuotas();
    binding.ReaderQuotas = quotas;
    binding.ReaderQuotas.MaxStringContentLength = 10485760;
    binding.ReaderQuotas.MaxArrayLength = 10000;

    binding.Security.Message.ClientCredentialType = MessageCredentialType.None;

    ...

    ServiceEndpoint endpoint = _svcHost.AddServiceEndpoint(intfType, binding, serviceBaseAddress + "/" + intfType.Name);
    endpoint.Behaviors.Add(new ClientTrackerEndpointBehavior());


问题出现在客户端抛出异常。 5到10个客户端连接到服务,每个客户端都抛出这种类型的异常(即使是与服务本身在同一台机器上运行的客户端):



System.ServiceModel.CommunicationException
  The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:00:29.9969997'.
  An existing connection was forcibly closed by the remote host
StackTrace:
   at System.Net.Sockets.Socket.Send(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
   at System.ServiceModel.Channels.SocketConnection.Write(Byte[] buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)


抛出异常后,我们尝试使用telnet手动连接到服务,似乎也拒绝了此连接尝试。由于我们在生产系统上有3个客户遇到问题,我们无法将Visual Studio调试器附加到,我们使用WinDBG创建了用户小型转储来分析问题。我们检查的第一件事是ServiceThrottle的当前值(此处只有一个转储,但输出等于其他转储生成的输出):



    0:032> !dumpheap -type ServiceThrottle -short
    01fb65a4 

    0:032> !do 01fb65a4 
    Name:        System.ServiceModel.Dispatcher.ServiceThrottle
    MethodTable: 70cc56f0
    EEClass:     70a07ce4
    Size:        40(0x28) bytes
    File:        C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.ServiceModel\v4.0_4.0.0.0__b77a5c561934e089\System.ServiceModel.dll
    Fields:
          MT    Field   Offset                 Type VT     Attr    Value Name
    70cc572c  400314f        4 ...cher.FlowThrottle  0 instance 01fb6660 calls
    70cc572c  4003150        8 ...cher.FlowThrottle  0 instance 01fb6760 sessions
    71725030  4003151        c ...her.QuotaThrottle  0 instance 00000000 dynamic
    70cc572c  4003152       10 ...cher.FlowThrottle  0 instance 02013110 instanceContexts
    70cb653c  4003153       14 ...l.ServiceHostBase  0 instance 01fadd84 host
    70cc7cf8  4003154       18 ...manceCountersBase  0 instance 02003590 servicePerformanceCounters
    74246788  4003155       20       System.Boolean  1 instance        1 isActive
    7423f744  4003156       1c        System.Object  0 instance 01fb65cc thisLock
    74242ad4  400314d     1134         System.Int32  1   static      128 DefaultMaxConcurrentCallsCpuCount
    74242ad4  400314e     1138         System.Int32  1   static      800 DefaultMaxConcurrentSessionsCpuCount

    0:032> !do 01fb6660 
    Name:        System.ServiceModel.Dispatcher.FlowThrottle
    MethodTable: 70cc572c
    EEClass:     70a07d24
    Size:        52(0x34) bytes
    File:        C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.ServiceModel\v4.0_4.0.0.0__b77a5c561934e089\System.ServiceModel.dll
    Fields:
          MT    Field   Offset                 Type VT     Attr    Value Name
    74242ad4  4002ede       20         System.Int32  1 instance     1024 capacity
    74242ad4  4002edf       24         System.Int32  1 instance        0 count
    74246788  4002ee0       2c       System.Boolean  1 instance        0 warningIssued
    74242ad4  4002ee1       28         System.Int32  1 instance       89 warningRestoreLimit
    7423f744  4002ee2        4        System.Object  0 instance 01fb6694 mutex
    74232914  4002ee3        8 ...ding.WaitCallback  0 instance 01fb6640 release
    00000000  4002ee4        c                       0 instance 01fb66a0 waiters
    7423fb08  4002ee5       10        System.String  0 instance 01fb65d8 propertyName
    7423fb08  4002ee6       14        System.String  0 instance 01fb660c configName
    74230f78  4002ee7       18        System.Action  0 instance 02007670 acquired
    74230f78  4002ee8       1c        System.Action  0 instance 02007690 released

    0:032> !do 01fb6760 
    Name:        System.ServiceModel.Dispatcher.FlowThrottle
    MethodTable: 70cc572c
    EEClass:     70a07d24
    Size:        52(0x34) bytes
    File:        C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.ServiceModel\v4.0_4.0.0.0__b77a5c561934e089\System.ServiceModel.dll
    Fields:
          MT    Field   Offset                 Type VT     Attr    Value Name
    74242ad4  4002ede       20         System.Int32  1 instance     1024 capacity
    74242ad4  4002edf       24         System.Int32  1 instance        0 count
    74246788  4002ee0       2c       System.Boolean  1 instance        0 warningIssued
    74242ad4  4002ee1       28         System.Int32  1 instance      560 warningRestoreLimit
    7423f744  4002ee2        4        System.Object  0 instance 01fb6794 mutex
    74232914  4002ee3        8 ...ding.WaitCallback  0 instance 01fb6740 release
    00000000  4002ee4        c                       0 instance 01fb67a0 waiters
    7423fb08  4002ee5       10        System.String  0 instance 01fb66d0 propertyName
    7423fb08  4002ee6       14        System.String  0 instance 01fb6708 configName
    74230f78  4002ee7       18        System.Action  0 instance 020076b0 acquired
    74230f78  4002ee8       1c        System.Action  0 instance 020076d0 released

所有这些价值似乎都很好。然后我们检查了线程池和线程:



    0:032> !threadpool
    CPU utilization: 0%
    Worker Thread: Total: 1023 Running: 1017 Idle: 6 MaxLimit: 1023 MinLimit: 1000
    Work Request in Queue: 0
    --------------------------------------
    Number of Timers: 4
    --------------------------------------
    Completion Port Thread:Total: 32 Free: 0 MaxFree: 16 CurrentLimit: 33 MaxLimit: 1000 MinLimit: 1000

    0:032> !threads -special
    ThreadCount:      1027
    UnstartedThread:  997
    BackgroundThread: 28
    PendingThread:    997
    DeadThread:       1
    Hosted Runtime:   no
                                       PreEmptive   GC Alloc                Lock
           ID  OSID ThreadOBJ    State GC           Context       Domain   Count APT Exception
       0    1  1d80 006b6518      a020 Enabled  00000000:00000000 006ac0b0     0 MTA
       2    2  238c 006c1840      b220 Enabled  00000000:00000000 006ac0b0     0 MTA (Finalizer)
    XXXX    4       00704f58   1019820 Enabled  00000000:00000000 006ac0b0     0 Ukn (Threadpool Worker)
       4    5  21a4 00706480   3009220 Enabled  0bedaf78:0bedb5c8 006ac0b0     0 MTA (Threadpool Worker)
       5    6   e8c 03de9428   100a220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
       7    7   634 03e0d318   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
       8    8  1d38 03ebeb08   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
       9    9  1808 03e4fd70   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      10    a  1c48 03e50d70   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      11    b  1d88 073be2b0   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      12    c  1c74 073bf2b8   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      13    d  1ae4 073c0dc8   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      14    e  1818 073c1598   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      15    f  1a58 073c1fa8   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      16   10  13e0 073c4bb8   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      17   11  1a3c 073c5388   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      18   12  1b5c 03e9ffe0   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      19   13  1b80 03ea04e8   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      20   14   900 03ea09f0   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      21   15  1c84 03ea0ef8   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      22   16   da0 03ea1400   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      23   17  13b0 03ea1908   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      24   18  18cc 03ea1e10   3009220 Enabled  00000000:00000000 006ac0b0     0 MTA (Threadpool Worker)
      25   1a  1008 03ea2820   3009220 Enabled  0bfe03b8:0bfe21c4 006ac0b0     0 MTA (Threadpool Worker)
      27   29   bc4 0b041588   1009220 Enabled  0bdd7c3c:0bdd99f8 006ac0b0     0 MTA (Threadpool Worker)
      28   3b   ad8 0af7d740   1009220 Enabled  0bf8c0d8:0bf8c1c4 006ac0b0     0 MTA (Threadpool Worker)
      29   64   dd8 0ae54890   1009220 Enabled  0bdfbb9c:0bdfd9f8 006ac0b0     0 MTA (Threadpool Worker)
      30   2f   440 0b03c010   1009220 Enabled  0be03bf0:0be059f8 006ac0b0     0 MTA (Threadpool Worker)
      31   25  2198 0b03f080   1009220 Enabled  0bd6d5b4:0bd6f410 006ac0b0     0 MTA (Threadpool Worker)
      32   63  1b9c 0ae41388   1009220 Enabled  0bdb5b14:0bdb79f8 006ac0b0     0 MTA (Threadpool Worker)
    XXXX   33  2270 0b09cd20      1400 Enabled  00000000:00000000 006ac0b0     0 Ukn
    XXXX   5b  1554 0ae54388      1400 Enabled  00000000:00000000 006ac0b0     0 MTA
    XXXX   31  1098 0ae53978      1400 Enabled  00000000:00000000 006ac0b0     0 Ukn
    XXXX   34   15c 0af7be18      1400 Enabled  00000000:00000000 006ac0b0     0 Ukn
    ... -> lots of more threads here
    XXXX  403  24e4 0d85b578      1400 Enabled  00000000:00000000 006ac0b0     0 Ukn
    XXXX  404  24d8 0d85ba80      1400 Enabled  00000000:00000000 006ac0b0     0 Ukn
    XXXX  405  24e0 0d85bf88      1400 Enabled  00000000:00000000 006ac0b0     0 Ukn

           OSID     Special thread type
        1    a88    DbgHelper 
        2   238c    Finalizer 
        4   21a4    ThreadpoolWorker 
        5    e8c    Timer 
        7    634    ThreadpoolWorker 
        8   1d38    ThreadpoolWorker 
        9   1808    ThreadpoolWorker 
       10   1c48    ThreadpoolWorker 
       11   1d88    ThreadpoolWorker 
       12   1c74    ThreadpoolWorker 
       13   1ae4    ThreadpoolWorker 
       14   1818    ThreadpoolWorker 
       15   1a58    ThreadpoolWorker 
       16   13e0    ThreadpoolWorker 
       17   1a3c    ThreadpoolWorker 
       18   1b5c    ThreadpoolWorker 
       19   1b80    ThreadpoolWorker 
       20    900    ThreadpoolWorker 
       21   1c84    ThreadpoolWorker 
       22    da0    ThreadpoolWorker 
       23   13b0    ThreadpoolWorker 
       24   18cc    ThreadpoolWorker 
       25   1008    ThreadpoolWorker 
       26   1b08    Gate 
       27    bc4    ThreadpoolWorker 
       28    ad8    ThreadpoolWorker 
       29    dd8    ThreadpoolWorker 
       30    440    ThreadpoolWorker 
       31   2198    ThreadpoolWorker 
       32   1b9c    ThreadpoolWorker 


当我们看到这么多线程时,我们感到震惊。现在我们怀疑我们的问题可能与大量线程有关。因此,如果有人能够回答以下问题,那将非常感激:

  1. 我们的假设可以正确吗?或者我们是否在调查线程的错误轨道?
  2. !threadpool命令输出非常多的运行线程(1027)。查看!线程的输出,似乎只有28个线程在工作。如何解释这些差异?
  3. 我们有非常多的未启动/挂起的线程。 unstarted和pending挂起的线程有什么区别?我们试图重现这种行为,但即使在线程池中设置最小和最大线程,我们也不会得到这些高数字。更重要的是,调查在2小时的进程正常运行时间之后创建的转储,没有找到未启动或挂起的线程(此时服务仍在工作)。原始转储的过程正常运行时间约为14天。
  4. 完成端口线程自由值0是什么意思?
  5. 我们可以使用WinDBG中的其他方法/命令来更好地理解我们的问题吗?
  6. 我们对我们软件的当前状态非常不满意,并且查找有关此主题的信息通常表示它是一个未公开的会话/并发呼叫问题,但如前所述,这似乎不是我们的问题。非常感谢任何帮助!

1 个答案:

答案 0 :(得分:1)

我们遇到了与自托管WCF接口类似的问题,该接口为异步(2个单向服务调用)后端请求提供了同步请求/响应Web服务。在我们测试的早期,我们注意到在一些天数变化之后,我们的服务对新请求没有反应。经过一些研究,我们发现只要后端服务(我们无法控制)没有发送响应,我们就会无限期地等待,因此我们保持客户端连接打开。

我们通过提供“等待时间”配置值来解决问题,因此我们确保响应客户端并关闭连接。我们使用了类似下面的东西......

Task processTask = Task.Factory.StartNew(() => Process(message));

bool isProcessSuccess = processTask.Wait(shared.ConfigReader.SyncWebServiceWaitTime);

if (!isProcessSuccess)
{ 
 //handle error … 
}

以下链接提供有关WCF服务性能计数器的信息,可能有助于进一步确定是否按预期关闭了呼叫。 http://blogs.microsoft.co.il/blogs/idof/archive/2011/08/11/wcf-scaling-check-your-counters.aspx