我有一位客户偶尔会报告IIS死锁。它是跨越多个服务器的相当大的托管ASP.Net应用程序,但在这种情况下发生在一个简单的Web服务器上,该服务器返回静态文件(HTTP,Javascript)或充当代理并在应用程序层上调用Web服务。请注意,这是一个.Net 3.5应用程序,应用程序池使用经典管道。
我有一个转储并一直在分析它,但据我所知,没有资源被阻止以这种方式发生死锁。
错误的线程是#4并且属于IIS。堆栈表明它检查健康问题,找到一个(死锁?),并使工作进程失败。
0:004> kv
# Child-SP RetAddr : Args to Child : Call Site
00 00000000`01b1e6c0 000007fe`f8c96d82 : 00000000`01b1e810 00000000`00000000 00000000`01b1e848 00000000`01b1e848 : KERNELBASE!RaiseException+0x39
01 00000000`01b1e790 000007fe`f80779cc : 00000000`05da4c58 00000000`00000082 00000000`05da4c58 00000000`00000082 : w3wphost!W3WP_HOST::FailWorkerProcess+0x2e
02 00000000`01b1e7e0 000007fe`f80728cb : 00000000`00000001 00000000`00000000 00000000`00000000 00000000`000c000a : isapi!RegisterModule+0xcce4
03 00000000`01b1e830 000007fe`f806dd84 : 00000000`01b1ed40 00000000`00000020 00000000`00000004 000007fe`00000020 : isapi!RegisterModule+0x7be3
04 00000000`01b1ecf0 000007fe`f7f07459 : 00000000`05da4c58 00000000`05da4c58 00000000`05da4c58 000007fe`f806e06f : isapi!RegisterModule+0x309c
05 00000000`01b1edd0 000007fe`f7f07617 : 01cefba1`36242e6e 000007fe`f80419d6 00000000`00000000 00000000`05da4c58 : webengine!ReportHealthProblem+0xc9
06 00000000`01b1ef30 000007fe`f7f08d6b : 01cefba1`34feed30 01cefba1`36242e6e 00000000`05da4c58 00000000`00000000 : webengine!CheckAndReportHealthProblems+0xb7
07 00000000`01b1ef60 000007fe`f806c540 : 00000000`0126a088 00000000`05da4c58 00000000`01b1f330 000007fe`f8063588 : webengine!AspNetHttpExtensionProc+0x1db
...
我检查的第一件事是与dlk的死锁;没有被发现。
0:004> !dlk
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
No deadlocks detected.
据说!dlk没有找到一些死锁,所以我接下来检查!线程,看是否有任何锁定。不少是。 这些线程正在另一台服务器上调用webservices。
0:004> !threads
ThreadCount: 63
UnstartedThread: 0
BackgroundThread: 57
PendingThread: 0
DeadThread: 6
Hosted Runtime: no
PreEmptive Lock
ID OSID ThreadOBJ State GC GC Alloc Context Domain Count APT Exception
7 1 96c 00000000021394a0 8220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
16 2 9a4 0000000002142250 b220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Finalizer)
17 4 a64 00000000021913c0 80a220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Threadpool Completion Port)
18 5 a70 00000000021930f0 1220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
20 e b6c 00000000022700e0 880b220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Threadpool Completion Port)
6 b 968 0000000005aaf0b0 220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
4 41 960 0000000005ab0220 220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
5 4f 964 0000000005a647c0 220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
30 ac 700 0000000005aafc50 220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
XXXX a7 0 0000000005ab1960 1801820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Threadpool Worker)
XXXX aa 0 0000000005dcf540 1801820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Threadpool Worker)
XXXX be 0 0000000005dce3d0 1801820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn (Threadpool Worker)
XXXX ae 0 0000000005dcd830 1801820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Threadpool Worker)
XXXX 38 0 0000000005ab0dc0 9820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA
XXXX 39 0 0000000005ab2500 9820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA
31 37 133c 0000000005ab07f0 180b220 Enabled 00000001c0668760:00000001c0668d50 0000000002193ba0 1 MTA (Threadpool Worker)
32 34 ce8 0000000005d93550 180b220 Enabled 00000001806563a8:0000000180657ad0 0000000002193ba0 1 MTA (Threadpool Worker)
33 c4 be4 0000000005d90ca0 180b220 Enabled 00000001c065e7b0:00000001c065ed50 0000000002248da0 1 MTA (Threadpool Worker)
34 101 5a0 0000000005d923e0 180b220 Enabled 000000014072bfa0:000000014072daa0 0000000002248da0 1 MTA (Threadpool Worker)
36 c5 6c0 0000000005d906d0 180b220 Enabled 000000010004d360:000000010004e150 0000000002193ba0 1 MTA (Threadpool Worker)
37 35 76c 0000000005d940f0 180b220 Enabled 000000010005a950:000000010005c150 0000000002248da0 1 MTA (Threadpool Worker)
38 36 cdc 0000000005d92f80 180b220 Enabled 0000000100069c20:000000010006a150 0000000002193ba0 1 MTA (Threadpool Worker)
39 33 b90 0000000005d91e10 180b220 Enabled 000000014072f460:000000014072faa0 0000000002248da0 1 MTA (Threadpool Worker)
40 32 12b8 0000000005d929b0 180b220 Enabled 0000000100083520:0000000100084150 0000000002193ba0 1 MTA (Threadpool Worker)
41 31 11d8 0000000005d95e00 180b220 Enabled 0000000100091cb0:0000000100092150 0000000002193ba0 1 MTA (Threadpool Worker)
42 30 d78 0000000005d946c0 180b220 Enabled 00000001000a3af8:00000001000a4150 0000000002193ba0 1 MTA (Threadpool Worker)
43 2f bd8 0000000005d95830 180b220 Enabled 00000001000b1200:00000001000b2150 0000000002193ba0 1 MTA (Threadpool Worker)
44 2e 598 0000000005d91840 180b220 Enabled 00000001000bf808:00000001000c0150 0000000002193ba0 1 MTA (Threadpool Worker)
45 2d ba0 0000000005d93b20 180b220 Enabled 00000001000cd698:00000001000ce150 0000000002193ba0 1 MTA (Threadpool Worker)
46 2c 136c 0000000005d94c90 180b220 Enabled 00000001000df068:00000001000e0150 0000000002193ba0 1 MTA (Threadpool Worker)
47 ca 8f0 0000000005d90100 180b220 Enabled 00000001000ed618:00000001000ee150 0000000002193ba0 1 MTA (Threadpool Worker)
48 102 d14 0000000005d95260 180b220 Enabled 00000001000fc7d0:00000001000fe150 0000000002193ba0 1 MTA (Threadpool Worker)
49 cb 12c0 0000000005d91270 180b220 Enabled 000000010010cf88:000000010010e150 0000000002193ba0 1 MTA (Threadpool Worker)
50 c7 e98 0000000005d969a0 180b220 Enabled 000000010011c618:000000010011e150 0000000002248da0 1 MTA (Threadpool Worker)
51 e2 d74 0000000005d96f70 180b220 Enabled 000000010012b758:000000010012c150 0000000002248da0 1 MTA (Threadpool Worker)
52 c2 1278 0000000005d97540 180b220 Enabled 00000001001395e0:000000010013a150 0000000002248da0 1 MTA (Threadpool Worker)
53 c8 8e0 0000000005d963d0 180b220 Enabled 0000000100148898:000000010014a150 0000000002193ba0 1 MTA (Threadpool Worker)
54 c6 24c 0000000005aaf680 180b220 Enabled 00000001001595d8:000000010015a150 0000000002193ba0 1 MTA (Threadpool Worker)
55 c9 708 0000000005ab1f30 180b220 Enabled 0000000180658120:0000000180659ad0 0000000002248da0 1 MTA (Threadpool Worker)
56 c3 110c 0000000005ab1390 180b220 Enabled 0000000100176ce8:0000000100178150 0000000002248da0 1 MTA (Threadpool Worker)
57 cd 8dc 0000000005dc8100 180b220 Enabled 000000010018c0f8:000000010018c150 0000000002193ba0 1 MTA (Threadpool Worker)
58 d1 588 0000000005dca9b0 180b220 Enabled 000000010019a620:000000010019c150 0000000002193ba0 1 MTA (Threadpool Worker)
59 d0 31c 0000000005dc8ca0 180b220 Enabled 00000001001ab9a0:00000001001ac150 0000000002193ba0 1 MTA (Threadpool Worker)
60 1a cb4 0000000005dca3e0 180b220 Enabled 00000001001ba7f8:00000001001bc150 0000000002193ba0 1 MTA (Threadpool Worker)
61 1b 13cc 0000000005dc9840 180b220 Enabled 00000001001ca798:00000001001cc150 0000000002193ba0 1 MTA (Threadpool Worker)
62 1c 12f4 0000000005dccc90 180b220 Enabled 00000001001da7d0:00000001001dc150 0000000002193ba0 1 MTA (Threadpool Worker)
63 1d 11c8 0000000005dc86d0 180b220 Enabled 00000001001eab48:00000001001ec150 0000000002193ba0 1 MTA (Threadpool Worker)
64 1e 1304 0000000005dcde00 180b220 Enabled 00000001001fa960:00000001001fc150 0000000002248da0 1 MTA (Threadpool Worker)
65 1f 1258 0000000005dcbb20 180b220 Enabled 000000010020ab18:000000010020c150 0000000002193ba0 1 MTA (Threadpool Worker)
66 20 854 0000000005dc9270 180b220 Enabled 00000001407307c8:0000000140731aa0 0000000002193ba0 1 MTA (Threadpool Worker)
67 21 13bc 0000000005dcaf80 180b220 Enabled 000000010022bd30:000000010022c150 0000000002248da0 1 MTA (Threadpool Worker)
68 22 c4c 0000000005dc9e10 180b220 Enabled 00000001002409f0:0000000100242150 0000000002193ba0 1 MTA (Threadpool Worker)
69 23 10dc 0000000005dcc6c0 180b220 Enabled 0000000100251c70:0000000100252150 0000000002193ba0 1 MTA (Threadpool Worker)
70 24 264 0000000005dcc0f0 180b220 Enabled 0000000100261288:0000000100262150 0000000002193ba0 1 MTA (Threadpool Worker)
71 25 3c8 0000000005dcb550 180b220 Enabled 0000000100271688:0000000100272150 0000000002193ba0 1 MTA (Threadpool Worker)
72 26 b88 0000000005dcd260 180b220 Enabled 0000000100287420:0000000100288150 0000000002193ba0 1 MTA (Threadpool Worker)
74 27 1318 0000000005dce9a0 180b220 Enabled 00000001002975c8:0000000100298150 0000000002248da0 1 MTA (Threadpool Worker)
75 28 bdc 0000000005dcef70 180b220 Enabled 00000001002a6d48:00000001002a8150 0000000002193ba0 1 MTA (Threadpool Worker)
76 29 100 0000000005a65930 180b220 Enabled 00000001002b7698:00000001002b8150 0000000002193ba0 1 MTA (Threadpool Worker)
77 2a e5c 0000000005a67070 180b220 Enabled 00000001002c70f8:00000001002c8150 0000000002248da0 1 MTA (Threadpool Worker)
78 2b 434 0000000005a67c10 180b220 Enabled 00000001002d6b78:00000001002d8150 0000000002193ba0 1 MTA (Threadpool Worker)
79 cc e4c 0000000005a65360 180b220 Enabled 00000001002e65f8:00000001002e8150 0000000002193ba0 1 MTA (Threadpool Worker)
80 3a dd0 0000000005a64d90 180b220 Enabled 00000001002f66c0:00000001002f8150 0000000002193ba0 1 MTA (Threadpool Worker)
接下来我检查是否有任何线程挂起。一切都在一分钟之内,超过10秒的那些都属于ASP.Net基础设施而没有运行任何代码。
0:004> !runaway
User Mode Time
Thread Time
4:960 0 days 0:00:39.843
5:964 0 days 0:00:33.281
6:968 0 days 0:00:25.906
7:96c 0 days 0:00:24.000
31:133c 0 days 0:00:09.437
14:99c 0 days 0:00:06.953
32:ce8 0 days 0:00:06.921
15:9a0 0 days 0:00:06.890
13:998 0 days 0:00:06.750
30:700 0 days 0:00:06.062
33:be4 0 days 0:00:04.937
12:994 0 days 0:00:04.640
26:9f8 0 days 0:00:02.859
34:5a0 0 days 0:00:01.203
16:9a4 0 days 0:00:00.765
0:934 0 days 0:00:00.062
17:a64 0 days 0:00:00.031
65:1258 0 days 0:00:00.015
52:1278 0 days 0:00:00.015
21:1338 0 days 0:00:00.015
80:dd0 0 days 0:00:00.000
79:e4c 0 days 0:00:00.000
78:434 0 days 0:00:00.000
77:e5c 0 days 0:00:00.000
76:100 0 days 0:00:00.000
75:bdc 0 days 0:00:00.000
74:1318 0 days 0:00:00.000
73:77c 0 days 0:00:00.000
72:b88 0 days 0:00:00.000
71:3c8 0 days 0:00:00.000
70:264 0 days 0:00:00.000
69:10dc 0 days 0:00:00.000
68:c4c 0 days 0:00:00.000
67:13bc 0 days 0:00:00.000
66:854 0 days 0:00:00.000
64:1304 0 days 0:00:00.000
63:11c8 0 days 0:00:00.000
62:12f4 0 days 0:00:00.000
61:13cc 0 days 0:00:00.000
60:cb4 0 days 0:00:00.000
59:31c 0 days 0:00:00.000
58:588 0 days 0:00:00.000
57:8dc 0 days 0:00:00.000
56:110c 0 days 0:00:00.000
55:708 0 days 0:00:00.000
54:24c 0 days 0:00:00.000
53:8e0 0 days 0:00:00.000
51:d74 0 days 0:00:00.000
50:e98 0 days 0:00:00.000
49:12c0 0 days 0:00:00.000
48:d14 0 days 0:00:00.000
47:8f0 0 days 0:00:00.000
46:136c 0 days 0:00:00.000
45:ba0 0 days 0:00:00.000
44:598 0 days 0:00:00.000
43:bd8 0 days 0:00:00.000
42:d78 0 days 0:00:00.000
41:11d8 0 days 0:00:00.000
40:12b8 0 days 0:00:00.000
39:b90 0 days 0:00:00.000
38:cdc 0 days 0:00:00.000
37:76c 0 days 0:00:00.000
36:6c0 0 days 0:00:00.000
35:11b4 0 days 0:00:00.000
29:11b8 0 days 0:00:00.000
28:438 0 days 0:00:00.000
27:12d8 0 days 0:00:00.000
25:e44 0 days 0:00:00.000
24:670 0 days 0:00:00.000
23:1284 0 days 0:00:00.000
22:139c 0 days 0:00:00.000
20:b6c 0 days 0:00:00.000
19:adc 0 days 0:00:00.000
18:a70 0 days 0:00:00.000
11:990 0 days 0:00:00.000
10:988 0 days 0:00:00.000
9:984 0 days 0:00:00.000
8:970 0 days 0:00:00.000
3:954 0 days 0:00:00.000
2:940 0 days 0:00:00.000
1:938 0 days 0:00:00.000
仍然好奇关于线上的锁,我检查!mlocks。它表明这些都是属于由ASP.Net创建的System.Web.HttpApplication实例的细分。也没有递归,所以这看起来也很好。
我接下来检查!threadpool。我没有用过这么多,所以我不确定我是否正确解释输出,但看起来应用程序还没有达到限制(400)并且没有等待请求,所以这似乎没问题。
0:004> !threadpool
CPU utilization 0%
Worker Thread: Total: 48 Running: 48 Idle: 0 MaxLimit: 400 MinLimit: 4
Work Request in Queue: 0
--------------------------------------
Number of Timers: 34
--------------------------------------
Completion Port Thread:Total: 1 Free: 1 MaxFree: 8 CurrentLimit: 0 MaxLimit: 400 MinLimit: 4
我一直在分析这个问题几天,我不知道如何理解这个问题,并希望了解如何识别IIS检测到的这个“健康问题”。
的更新 的 线程堆栈太大而无法包含在此处,所以我已将它们上传到here。出于隐私原因,公司类型已更名为Foo.Bar。方法名称0是由于实际的混淆。
更新2 感谢评论,我找到了KB 821268,这似乎很有用。我一定不能正确解释!threadpool的输出。它表示总计:48,运行:48(和空闲:0)这可能意味着它已经用尽,但后来我不知道这对MaxLimit有什么意义:400。也许有人可以直截了当地说明这一点。 / p>
答案 0 :(得分:0)
经过一番研究和分析后,我能够找出问题的原因。我对!threadpool的输出过多考虑。虽然数据很有用,但它告诉我的是我没有用尽工作线程或完成线程。
相反,真正的问题在于活动网络连接的数量。此值在.config文件中设置。对于.Net 2+环境,它通常自动配置为12 * #CPU。对于此转储,CPU的数量为4,最多导致48个打开的连接。这与通过HttpWebRequest读取数据的线程数相匹配,如堆栈所示。
最初的线索来自KB 821268,但缺乏有关如何诊断此问题的详细信息。该验证位于this博文中。简而言之,我将所有System.Net.ServicePoint对象转储到堆上。每一个都是m_CurrentConnections和m_ConnectionLimit。求和所有m_CurrentConnections总计得到了开放网络连接的总数,这个转储的数量为48.此外,m_ConnectionLimit验证了最大值为48.匹配的数字。已达到最大连接数,导致IIS终止该进程。这不是一个“死锁”,但不幸的是,IIS在事件日志中留言时对问题或原因并不十分具体。