自从我们从ColdFusion 8 Enterprise升级到ColdFusion 9 Enterprise以来,我们一直在处理事件网关的问题。
我们有一个事件网关设置来建立与第三方的连接。他们至少每10秒钟给我们更新一次,有时甚至多次。
我们将Java类配置为Event Gateway侦听器,并将事件和数据推送到CFC函数。在该函数中,我们实际使用名为 <cflock>
的来确保按顺序处理请求,并且此时请求将排队以具有对命名锁的独占访问权。此锁具有 30秒超时。
我在这个函数中也有很多调试,我注意到了一些事情:
<cflock>
标记之前排队,等待锁定,此队列可能超过40个事件所以问题是,一次又一次,我将发生锁定超时,并且在30秒之后。首先我记录请求是否在等待锁定。它看起来像这样:
"Information","Thread-23","06/23/10","15:45:18","APP1","F4B42A5A-F34D-C614-DE01B150E6906F78 (1277304318606) : PRE LOCK"
然后在日志的下方,我看到了同样的请求:
"Error","Thread-23","06/23/10","15:45:48","APP1","F4B42A5A-F34D-C614-DE01B150E6906F78 (1277304348607) : LOCK ERROR: A timeout occurred while attempting to lock lock_ResponseDispatcher."
他们之间有30秒。此时请求和与其关联的任何事件数据都将丢失。对我不好。
所以我想我会看到队列的处理速度是否足够快。我不确定事件是如何排队的<cflock>
。有硬限制吗?
无论如何,在这次特别的比赛中,我看到了:
当已经有6个请求时,请求进入队列,因此队列中的数字为7
在接下来的30秒内,大约有17个请求从队列中删除
大约相同数量的请求被添加到队列
在此期间,相关请求未经处理且超过30秒后
我简直不敢相信自己的眼睛!好像<cflock>
队列不是先进先出(FIFO),而是先进去(FILO)!
这样的事情可能吗?有没有其他人都看到过这种行为?
非常感谢有任何想法的人。
Ciaran
答案 0 :(得分:2)
我认为这里的关键是我正在使用异步的event gateways这个事实。事实上,在进行实验后,我的问题的原因似乎非常明显:)
我已指定可用于处理CF admin中的事件网关请求的线程数(请参阅Event Gateway - &gt; Settings下)。此设置在CF Dev Edition上停留在1,但可以在Enterprise Edition中增加。我把这个实验增加到了5。只有当它增加时才能看到这种奇怪的行为。
所以我的事件网关代码非常简单,它只是创建一个UUID(所以我可以在我的日志中跟踪请求),然后锁定线程5秒钟来模拟真实的处理。此sleep
发生在cflock
调用内,因此只有一个线程可以立即执行处理。我们需要这样做以避免在实际代码中处理重复项。
以下是完整的CFC:
component {
public void function onIncomingMessage (required struct msg) {
var sys = createObject("java", "java.lang.System");
var tag = createUUID();
var logFile = "test\gatewaytest";
writelog (file=logFile, text="#tag# - about to queue");
try {
lock name="myTestLock" timeout="120" {
writelog (file=logFile, text="#tag# - got lock");
thread action="sleep" duration="5000"; //ms
}
writelog (file=logFile, text="#tag# - released lock");
} catch (any e) {
writelog (file=logFile, text="#tag# - ERROR - #e.message#");
}
}
}
注意锁上的超长超时值(2分钟)。这是为了处理事件网关异步处理带来的问题。
事件网关是一个简单的内置CFML类型,ID为'TestGW',我链接到上面的CFC。
我设置了一个简单的脚本来向事件网关发送事件,这里是完整的:
<cfset msg = {mymessage = "hello gateway"} />
<cfset sendGatewayMessage("TestGW", msg) />
情景1 - 单线:
如果事件网关处理线程的数量设置为1并且我敲击网关,我会看到以下日志输出:
"Information","Thread-17","06/25/10","10:32:09",,"50805BB4-1C23-9073-67A70A86CA6F8E54 - about to queue"
"Information","Thread-17","06/25/10","10:32:09",,"50805BB4-1C23-9073-67A70A86CA6F8E54 - got lock"
"Information","Thread-17","06/25/10","10:32:14",,"50805BB4-1C23-9073-67A70A86CA6F8E54 - released lock"
"Information","Thread-17","06/25/10","10:32:14",,"50811F1A-1C23-9073-67AD3E9C0BF2000C - about to queue"
"Information","Thread-17","06/25/10","10:32:14",,"50811F1A-1C23-9073-67AD3E9C0BF2000C - got lock"
"Information","Thread-17","06/25/10","10:32:19",,"50811F1A-1C23-9073-67AD3E9C0BF2000C - released lock"
"Information","Thread-17","06/25/10","10:32:19",,"5081E27F-1C23-9073-67B5D2EF6AED8426 - about to queue"
"Information","Thread-17","06/25/10","10:32:19",,"5081E27F-1C23-9073-67B5D2EF6AED8426 - got lock"
"Information","Thread-17","06/25/10","10:32:24",,"5081E27F-1C23-9073-67B5D2EF6AED8426 - released lock"
"Information","Thread-17","06/25/10","10:32:24",,"5082A5E1-1C23-9073-674E9467F395686F - about to queue"
"Information","Thread-17","06/25/10","10:32:24",,"5082A5E1-1C23-9073-674E9467F395686F - got lock"
"Information","Thread-17","06/25/10","10:32:29",,"5082A5E1-1C23-9073-674E9467F395686F - released lock"
这里要注意的关键是它是单线程的。这一切都是关于一次一个地排队事件,一切都按顺序发生。
情景2 - 更多线索:
如果事件网关处理线程数增加到5并且我敲击网关,我会看到以下日志输出:
"Information","Thread-18","06/25/10","11:26:01",,"526CC05B-C9E1-FADE-73CE3426BC0A3F92 - about to queue"
"Information","Thread-18","06/25/10","11:26:01",,"526CC05B-C9E1-FADE-73CE3426BC0A3F92 - got lock"
"Information","Thread-27","06/25/10","11:26:01",,"526CD0EB-049E-D382-2C3A7E3C0DBF8ED3 - about to queue"
"Information","Thread-21","06/25/10","11:26:02",,"526CDEED-C2B3-3C92-0F57CFA317AC02F8 - about to queue"
"Information","Thread-20","06/25/10","11:26:02",,"526CEE25-F25B-890C-F7501B5489C6BB21 - about to queue"
"Information","Thread-25","06/25/10","11:26:02",,"526CFD3C-EAFD-40E7-EBA2BE59B87D5936 - about to queue"
"Information","Thread-24","06/25/10","11:26:03",,"526D0FC5-E5E2-642E-452636C8838ADE33 - about to queue"
"Information","Thread-26","06/25/10","11:26:03",,"526D1096-C82E-535B-36D57D3A431D1436 - about to queue"
"Information","Thread-23","06/25/10","11:26:03",,"526D1F9C-9A9C-FA84-E153A944123E77BE - about to queue"
"Information","Thread-19","06/25/10","11:26:04",,"526D2EDC-EA54-4D83-3F6BB681A5CCAA89 - about to queue"
"Information","Thread-22","06/25/10","11:26:04",,"526D3F09-073F-2B0C-E94652D1C95B09CB - about to queue"
"Information","Thread-18","06/25/10","11:26:06",,"526CC05B-C9E1-FADE-73CE3426BC0A3F92 - released lock"
"Information","Thread-22","06/25/10","11:26:06",,"526D3F09-073F-2B0C-E94652D1C95B09CB - got lock"
"Information","Thread-22","06/25/10","11:26:11",,"526D3F09-073F-2B0C-E94652D1C95B09CB - released lock"
"Information","Thread-27","06/25/10","11:26:11",,"526CD0EB-049E-D382-2C3A7E3C0DBF8ED3 - got lock"
"Information","Thread-27","06/25/10","11:26:16",,"526CD0EB-049E-D382-2C3A7E3C0DBF8ED3 - released lock"
"Information","Thread-19","06/25/10","11:26:16",,"526D2EDC-EA54-4D83-3F6BB681A5CCAA89 - got lock"
"Information","Thread-19","06/25/10","11:26:21",,"526D2EDC-EA54-4D83-3F6BB681A5CCAA89 - released lock"
特别注意UUID 526D3F09-073F-2B0C-E94652D1C95B09CB
的请求。这是记录的最后一个请求,因此位于队列的末尾。但是,只要锁定可用,就会跳转并抓住锁定 - 而不是首先出现的526CD0EB-049E-D382-2C3A7E3C0DBF8ED3
请求。
<强>结论:强>
因为当我们使用多个线程时,我们无法保证在使用事件网关时等待cflock
时处理线程的顺序。我们需要确保锁的超时值足够高,以便在繁忙时间可以在任何一个请求超过锁定超时之前完整处理事件队列。
我想这可能使我们能够使用cflock
多线程事件网关!
我希望这可以帮助其他遇到此问题的人。
干杯,Ciaran。
答案 1 :(得分:1)
我不确定你的FIFO与LIFO问题有什么关系;但我可以为此提出一个建议:
他们之间有30秒。此时请求和与其关联的任何事件数据都将丢失。对我不好。
CFLock
标记有一个名为throwOnTimeout
的属性,默认为true。如果将其设置为false,而不是抛出异常,则在超时的情况下,处理将跳过锁定的代码块并继续正常运行。您可以使用此功能,例如:
<cfset made_it_through_lock = false />
<cflock name="single_threaded_lock_name" throwOnTimeout="false">
<!--- ... do stuff ... --->
<cfset made_it_through_lock = true />
</cflock>
<cfif made_it_through_lock eq false>
<!---
log the event data that you don't want to lose, then abort,
setting the necessary http status code & headers
--->
</cfif>