IIS检测到死锁,但无法通过内存转储来证实这一点

时间:2014-01-07 10:20:57

标签: asp.net iis deadlock windbg

我有一位客户偶尔会报告IIS死锁。它是跨越多个服务器的相当大的托管ASP.Net应用程序,但在这种情况下发生在一个简单的Web服务器上,该服务器返回静态文件(HTTP,Javascript)或充当代理并在应用程序层上调用Web服务。请注意,这是一个.Net 3.5应用程序,应用程序池使用经典管道。

我有一个转储并一直在分析它,但据我所知,没有资源被阻止以这种方式发生死锁。

错误的线程是#4并且属于IIS。堆栈表明它检查健康问题,找到一个(死锁?),并使工作进程失败。

0:004> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 00000000`01b1e6c0 000007fe`f8c96d82 : 00000000`01b1e810 00000000`00000000 00000000`01b1e848 00000000`01b1e848 : KERNELBASE!RaiseException+0x39
01 00000000`01b1e790 000007fe`f80779cc : 00000000`05da4c58 00000000`00000082 00000000`05da4c58 00000000`00000082 : w3wphost!W3WP_HOST::FailWorkerProcess+0x2e
02 00000000`01b1e7e0 000007fe`f80728cb : 00000000`00000001 00000000`00000000 00000000`00000000 00000000`000c000a : isapi!RegisterModule+0xcce4
03 00000000`01b1e830 000007fe`f806dd84 : 00000000`01b1ed40 00000000`00000020 00000000`00000004 000007fe`00000020 : isapi!RegisterModule+0x7be3
04 00000000`01b1ecf0 000007fe`f7f07459 : 00000000`05da4c58 00000000`05da4c58 00000000`05da4c58 000007fe`f806e06f : isapi!RegisterModule+0x309c
05 00000000`01b1edd0 000007fe`f7f07617 : 01cefba1`36242e6e 000007fe`f80419d6 00000000`00000000 00000000`05da4c58 : webengine!ReportHealthProblem+0xc9
06 00000000`01b1ef30 000007fe`f7f08d6b : 01cefba1`34feed30 01cefba1`36242e6e 00000000`05da4c58 00000000`00000000 : webengine!CheckAndReportHealthProblems+0xb7
07 00000000`01b1ef60 000007fe`f806c540 : 00000000`0126a088 00000000`05da4c58 00000000`01b1f330 000007fe`f8063588 : webengine!AspNetHttpExtensionProc+0x1db
...

我检查的第一件事是与dlk的死锁;没有被发现。

0:004> !dlk
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
No deadlocks detected.

据说!dlk没有找到一些死锁,所以我接下来检查!线程,看是否有任何锁定。不少是。 这些线程正在另一台服务器上调用webservices。

0:004> !threads
ThreadCount: 63
UnstartedThread: 0
BackgroundThread: 57
PendingThread: 0
DeadThread: 6
Hosted Runtime: no
                                              PreEmptive                                                Lock
       ID OSID        ThreadOBJ     State   GC     GC Alloc Context                  Domain           Count APT Exception
   7    1  96c 00000000021394a0      8220 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 Ukn
  16    2  9a4 0000000002142250      b220 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 MTA (Finalizer)
  17    4  a64 00000000021913c0    80a220 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 MTA (Threadpool Completion Port)
  18    5  a70 00000000021930f0      1220 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 Ukn
  20    e  b6c 00000000022700e0   880b220 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 MTA (Threadpool Completion Port)
   6    b  968 0000000005aaf0b0       220 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 Ukn
   4   41  960 0000000005ab0220       220 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 Ukn
   5   4f  964 0000000005a647c0       220 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 Ukn
  30   ac  700 0000000005aafc50       220 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 Ukn
XXXX   a7    0 0000000005ab1960   1801820 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 MTA (Threadpool Worker)
XXXX   aa    0 0000000005dcf540   1801820 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 MTA (Threadpool Worker)
XXXX   be    0 0000000005dce3d0   1801820 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 Ukn (Threadpool Worker)
XXXX   ae    0 0000000005dcd830   1801820 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 MTA (Threadpool Worker)
XXXX   38    0 0000000005ab0dc0      9820 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 MTA
XXXX   39    0 0000000005ab2500      9820 Enabled  0000000000000000:0000000000000000 00000000012e41d0     0 MTA
  31   37 133c 0000000005ab07f0   180b220 Enabled  00000001c0668760:00000001c0668d50 0000000002193ba0     1 MTA (Threadpool Worker)
  32   34  ce8 0000000005d93550   180b220 Enabled  00000001806563a8:0000000180657ad0 0000000002193ba0     1 MTA (Threadpool Worker)
  33   c4  be4 0000000005d90ca0   180b220 Enabled  00000001c065e7b0:00000001c065ed50 0000000002248da0     1 MTA (Threadpool Worker)
  34  101  5a0 0000000005d923e0   180b220 Enabled  000000014072bfa0:000000014072daa0 0000000002248da0     1 MTA (Threadpool Worker)
  36   c5  6c0 0000000005d906d0   180b220 Enabled  000000010004d360:000000010004e150 0000000002193ba0     1 MTA (Threadpool Worker)
  37   35  76c 0000000005d940f0   180b220 Enabled  000000010005a950:000000010005c150 0000000002248da0     1 MTA (Threadpool Worker)
  38   36  cdc 0000000005d92f80   180b220 Enabled  0000000100069c20:000000010006a150 0000000002193ba0     1 MTA (Threadpool Worker)
  39   33  b90 0000000005d91e10   180b220 Enabled  000000014072f460:000000014072faa0 0000000002248da0     1 MTA (Threadpool Worker)
  40   32 12b8 0000000005d929b0   180b220 Enabled  0000000100083520:0000000100084150 0000000002193ba0     1 MTA (Threadpool Worker)
  41   31 11d8 0000000005d95e00   180b220 Enabled  0000000100091cb0:0000000100092150 0000000002193ba0     1 MTA (Threadpool Worker)
  42   30  d78 0000000005d946c0   180b220 Enabled  00000001000a3af8:00000001000a4150 0000000002193ba0     1 MTA (Threadpool Worker)
  43   2f  bd8 0000000005d95830   180b220 Enabled  00000001000b1200:00000001000b2150 0000000002193ba0     1 MTA (Threadpool Worker)
  44   2e  598 0000000005d91840   180b220 Enabled  00000001000bf808:00000001000c0150 0000000002193ba0     1 MTA (Threadpool Worker)
  45   2d  ba0 0000000005d93b20   180b220 Enabled  00000001000cd698:00000001000ce150 0000000002193ba0     1 MTA (Threadpool Worker)
  46   2c 136c 0000000005d94c90   180b220 Enabled  00000001000df068:00000001000e0150 0000000002193ba0     1 MTA (Threadpool Worker)
  47   ca  8f0 0000000005d90100   180b220 Enabled  00000001000ed618:00000001000ee150 0000000002193ba0     1 MTA (Threadpool Worker)
  48  102  d14 0000000005d95260   180b220 Enabled  00000001000fc7d0:00000001000fe150 0000000002193ba0     1 MTA (Threadpool Worker)
  49   cb 12c0 0000000005d91270   180b220 Enabled  000000010010cf88:000000010010e150 0000000002193ba0     1 MTA (Threadpool Worker)
  50   c7  e98 0000000005d969a0   180b220 Enabled  000000010011c618:000000010011e150 0000000002248da0     1 MTA (Threadpool Worker)
  51   e2  d74 0000000005d96f70   180b220 Enabled  000000010012b758:000000010012c150 0000000002248da0     1 MTA (Threadpool Worker)
  52   c2 1278 0000000005d97540   180b220 Enabled  00000001001395e0:000000010013a150 0000000002248da0     1 MTA (Threadpool Worker)
  53   c8  8e0 0000000005d963d0   180b220 Enabled  0000000100148898:000000010014a150 0000000002193ba0     1 MTA (Threadpool Worker)
  54   c6  24c 0000000005aaf680   180b220 Enabled  00000001001595d8:000000010015a150 0000000002193ba0     1 MTA (Threadpool Worker)
  55   c9  708 0000000005ab1f30   180b220 Enabled  0000000180658120:0000000180659ad0 0000000002248da0     1 MTA (Threadpool Worker)
  56   c3 110c 0000000005ab1390   180b220 Enabled  0000000100176ce8:0000000100178150 0000000002248da0     1 MTA (Threadpool Worker)
  57   cd  8dc 0000000005dc8100   180b220 Enabled  000000010018c0f8:000000010018c150 0000000002193ba0     1 MTA (Threadpool Worker)
  58   d1  588 0000000005dca9b0   180b220 Enabled  000000010019a620:000000010019c150 0000000002193ba0     1 MTA (Threadpool Worker)
  59   d0  31c 0000000005dc8ca0   180b220 Enabled  00000001001ab9a0:00000001001ac150 0000000002193ba0     1 MTA (Threadpool Worker)
  60   1a  cb4 0000000005dca3e0   180b220 Enabled  00000001001ba7f8:00000001001bc150 0000000002193ba0     1 MTA (Threadpool Worker)
  61   1b 13cc 0000000005dc9840   180b220 Enabled  00000001001ca798:00000001001cc150 0000000002193ba0     1 MTA (Threadpool Worker)
  62   1c 12f4 0000000005dccc90   180b220 Enabled  00000001001da7d0:00000001001dc150 0000000002193ba0     1 MTA (Threadpool Worker)
  63   1d 11c8 0000000005dc86d0   180b220 Enabled  00000001001eab48:00000001001ec150 0000000002193ba0     1 MTA (Threadpool Worker)
  64   1e 1304 0000000005dcde00   180b220 Enabled  00000001001fa960:00000001001fc150 0000000002248da0     1 MTA (Threadpool Worker)
  65   1f 1258 0000000005dcbb20   180b220 Enabled  000000010020ab18:000000010020c150 0000000002193ba0     1 MTA (Threadpool Worker)
  66   20  854 0000000005dc9270   180b220 Enabled  00000001407307c8:0000000140731aa0 0000000002193ba0     1 MTA (Threadpool Worker)
  67   21 13bc 0000000005dcaf80   180b220 Enabled  000000010022bd30:000000010022c150 0000000002248da0     1 MTA (Threadpool Worker)
  68   22  c4c 0000000005dc9e10   180b220 Enabled  00000001002409f0:0000000100242150 0000000002193ba0     1 MTA (Threadpool Worker)
  69   23 10dc 0000000005dcc6c0   180b220 Enabled  0000000100251c70:0000000100252150 0000000002193ba0     1 MTA (Threadpool Worker)
  70   24  264 0000000005dcc0f0   180b220 Enabled  0000000100261288:0000000100262150 0000000002193ba0     1 MTA (Threadpool Worker)
  71   25  3c8 0000000005dcb550   180b220 Enabled  0000000100271688:0000000100272150 0000000002193ba0     1 MTA (Threadpool Worker)
  72   26  b88 0000000005dcd260   180b220 Enabled  0000000100287420:0000000100288150 0000000002193ba0     1 MTA (Threadpool Worker)
  74   27 1318 0000000005dce9a0   180b220 Enabled  00000001002975c8:0000000100298150 0000000002248da0     1 MTA (Threadpool Worker)
  75   28  bdc 0000000005dcef70   180b220 Enabled  00000001002a6d48:00000001002a8150 0000000002193ba0     1 MTA (Threadpool Worker)
  76   29  100 0000000005a65930   180b220 Enabled  00000001002b7698:00000001002b8150 0000000002193ba0     1 MTA (Threadpool Worker)
  77   2a  e5c 0000000005a67070   180b220 Enabled  00000001002c70f8:00000001002c8150 0000000002248da0     1 MTA (Threadpool Worker)
  78   2b  434 0000000005a67c10   180b220 Enabled  00000001002d6b78:00000001002d8150 0000000002193ba0     1 MTA (Threadpool Worker)
  79   cc  e4c 0000000005a65360   180b220 Enabled  00000001002e65f8:00000001002e8150 0000000002193ba0     1 MTA (Threadpool Worker)
  80   3a  dd0 0000000005a64d90   180b220 Enabled  00000001002f66c0:00000001002f8150 0000000002193ba0     1 MTA (Threadpool Worker)

接下来我检查是否有任何线程挂起。一切都在一分钟之内,超过10秒的那些都属于ASP.Net基础设施而没有运行任何代码。

0:004> !runaway
User Mode Time
  Thread       Time
   4:960       0 days 0:00:39.843
   5:964       0 days 0:00:33.281
   6:968       0 days 0:00:25.906
   7:96c       0 days 0:00:24.000
  31:133c      0 days 0:00:09.437
  14:99c       0 days 0:00:06.953
  32:ce8       0 days 0:00:06.921
  15:9a0       0 days 0:00:06.890
  13:998       0 days 0:00:06.750
  30:700       0 days 0:00:06.062
  33:be4       0 days 0:00:04.937
  12:994       0 days 0:00:04.640
  26:9f8       0 days 0:00:02.859
  34:5a0       0 days 0:00:01.203
  16:9a4       0 days 0:00:00.765
   0:934       0 days 0:00:00.062
  17:a64       0 days 0:00:00.031
  65:1258      0 days 0:00:00.015
  52:1278      0 days 0:00:00.015
  21:1338      0 days 0:00:00.015
  80:dd0       0 days 0:00:00.000
  79:e4c       0 days 0:00:00.000
  78:434       0 days 0:00:00.000
  77:e5c       0 days 0:00:00.000
  76:100       0 days 0:00:00.000
  75:bdc       0 days 0:00:00.000
  74:1318      0 days 0:00:00.000
  73:77c       0 days 0:00:00.000
  72:b88       0 days 0:00:00.000
  71:3c8       0 days 0:00:00.000
  70:264       0 days 0:00:00.000
  69:10dc      0 days 0:00:00.000
  68:c4c       0 days 0:00:00.000
  67:13bc      0 days 0:00:00.000
  66:854       0 days 0:00:00.000
  64:1304      0 days 0:00:00.000
  63:11c8      0 days 0:00:00.000
  62:12f4      0 days 0:00:00.000
  61:13cc      0 days 0:00:00.000
  60:cb4       0 days 0:00:00.000
  59:31c       0 days 0:00:00.000
  58:588       0 days 0:00:00.000
  57:8dc       0 days 0:00:00.000
  56:110c      0 days 0:00:00.000
  55:708       0 days 0:00:00.000
  54:24c       0 days 0:00:00.000
  53:8e0       0 days 0:00:00.000
  51:d74       0 days 0:00:00.000
  50:e98       0 days 0:00:00.000
  49:12c0      0 days 0:00:00.000
  48:d14       0 days 0:00:00.000
  47:8f0       0 days 0:00:00.000
  46:136c      0 days 0:00:00.000
  45:ba0       0 days 0:00:00.000
  44:598       0 days 0:00:00.000
  43:bd8       0 days 0:00:00.000
  42:d78       0 days 0:00:00.000
  41:11d8      0 days 0:00:00.000
  40:12b8      0 days 0:00:00.000
  39:b90       0 days 0:00:00.000
  38:cdc       0 days 0:00:00.000
  37:76c       0 days 0:00:00.000
  36:6c0       0 days 0:00:00.000
  35:11b4      0 days 0:00:00.000
  29:11b8      0 days 0:00:00.000
  28:438       0 days 0:00:00.000
  27:12d8      0 days 0:00:00.000
  25:e44       0 days 0:00:00.000
  24:670       0 days 0:00:00.000
  23:1284      0 days 0:00:00.000
  22:139c      0 days 0:00:00.000
  20:b6c       0 days 0:00:00.000
  19:adc       0 days 0:00:00.000
  18:a70       0 days 0:00:00.000
  11:990       0 days 0:00:00.000
  10:988       0 days 0:00:00.000
   9:984       0 days 0:00:00.000
   8:970       0 days 0:00:00.000
   3:954       0 days 0:00:00.000
   2:940       0 days 0:00:00.000
   1:938       0 days 0:00:00.000

仍然好奇关于线上的锁,我检查!mlocks。它表明这些都是属于由ASP.Net创建的System.Web.HttpApplication实例的细分。也没有递归,所以这看起来也很好。

我接下来检查!threadpool。我没有用过这么多,所以我不确定我是否正确解释输出,但看起来应用程序还没有达到限制(400)并且没有等待请​​求,所以这似乎没问题。

0:004> !threadpool
CPU utilization 0%
Worker Thread: Total: 48 Running: 48 Idle: 0 MaxLimit: 400 MinLimit: 4
Work Request in Queue: 0
--------------------------------------
Number of Timers: 34
--------------------------------------
Completion Port Thread:Total: 1 Free: 1 MaxFree: 8 CurrentLimit: 0 MaxLimit: 400 MinLimit: 4

我一直在分析这个问题几天,我不知道如何理解这个问题,并希望了解如何识别IIS检测到的这个“健康问题”。

更新 的 线程堆栈太大而无法包含在此处,所以我已将它们上传到here。出于隐私原因,公司类型已更名为Foo.Bar。方法名称0是由于实际的混淆。

更新2 感谢评论,我找到了KB 821268,这似乎很有用。我一定不能正确解释!threadpool的输出。它表示总计:48,运行:48(和空闲:0)这可能意味着它已经用尽,但后来我不知道这对MaxLimit有什么意义:400。也许有人可以直截了当地说明这一点。 / p>

1 个答案:

答案 0 :(得分:0)

经过一番研究和分析后,我能够找出问题的原因。我对!threadpool的输出过多考虑。虽然数据很有用,但它告诉我的是我没有用尽工作线程或完成线程。

相反,真正的问题在于活动网络连接的数量。此值在.config文件中设置。对于.Net 2+环境,它通常自动配置为12 * #CPU。对于此转储,CPU的数量为4,最多导致48个打开的连接。这与通过HttpWebRequest读取数据的线程数相匹配,如堆栈所示。

最初的线索来自KB 821268,但缺乏有关如何诊断此问题的详细信息。该验证位于this博文中。简而言之,我将所有System.Net.ServicePoint对象转储到堆上。每一个都是m_CurrentConnections和m_ConnectionLimit。求和所有m_CurrentConnections总计得到了开放网络连接的总数,这个转储的数量为48.此外,m_ConnectionLimit验证了最大值为48.匹配的数字。已达到最大连接数,导致IIS终止该进程。这不是一个“死锁”,但不幸的是,IIS在事件日志中留言时对问题或原因并不十分具体。