IBM Websphere MQ集群通道异常终止并经常恢复

时间:2012-08-04 16:16:59

标签: ibm-mq

在群集环境中,我看到特定服务器的通道异常结束并在一天内频繁恢复 例如:QMGR A连接了几个QMGRS(B,C,D,E,F)(每个都在不同的服务器上)
来自QMGR B,C,D,E,F的群集接收器通道在QMGR A上异常终止并在一天内非常频繁地恢复。

QMGR A LOGS


    -------------------------------------------------------------------------------  
08/04/12 08:44:41 - Process(1720412.1165) User(mqad) Program(amqrmppa)  
AMQ9209: Connection to host 'HOST.B (139.120.210.19)' closed.  

EXPLANATION:  
An error occurred receiving data from 'HOST.B (139.120.210.19)' over TCP/IP.  
 The connection to the remote host has unexpectedly terminated.  
ACTION:  
Tell the systems administrator.  
----- amqccita.c : 3094 -------------------------------------------------------  
08/04/12 08:44:41 - Process(1720412.1165) User(mqad) Program(amqrmppa)  
AMQ9999: Channel program ended abnormally.  

EXPLANATION:  
Channel program 'CHANNEL.TO.B' ended abnormally.  
ACTION:  
Look at previous error messages for channel program 'CHANNEL.TO.B' in the  
error files to determine the cause of the failure.  
----- amqrccca.c : 777 --------------------------------------------------------  
08/04/12 08:44:41 - Process(1720412.1175) User(mqad) Program(amqrmppa)  
AMQ9209: Connection to host 'HOST.C (155.10.186.20)' closed.  

EXPLANATION:  
An error occurred receiving data from 'HOST.C (155.10.186.20)' over TCP/IP.  
The connection to the remote host has unexpectedly terminated.  
ACTION:  
Tell the systems administrator.  
----- amqccita.c : 3094 -------------------------------------------------------  
08/04/12 08:44:41 - Process(1720412.1175) User(mqad) Program(amqrmppa)  
AMQ9999: Channel program ended abnormally.  

EXPLANATION:  
Channel program 'CHANNEL.TO.C' ended abnormally.  
ACTION:  
Look at previous error messages for channel program 'CHANNEL.TO.C' in the  
error files to determine the cause of the failure.  
    -------------------------------------------------------------------------------  

主机B上的QMGR日志


08/04/2012 08:44:09 AM - Process(17174.16023) User(mqad) Program(amqrmppa)
AMQ9259: Connection timed out from host 'HOST.A'.

EXPLANATION:
A connection from host 'HOST.A' over TCP/IP timed out.
ACTION:
Check to see why data was not received in the expected time. Correct the
problem. Reconnect the channel, or wait for a retrying channel to reconnect
itself.
----- amqccita.c : 3546 -------------------------------------------------------
08/04/2012 08:44:09 AM - Process(17174.16023) User(mqad) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.

EXPLANATION:
Channel program 'CHANNEL.TO.B' ended abnormally.
ACTION:
Look at previous error messages for channel program 'CHANNEL.TO.B' in the
error files to determine the cause of the failure.


HOST C上的QMGR日志

-------------------------------------------------------------------------------
08/04/12 08:44:35 - Process(462890.4658) User(mqad) Program(amqrmppa)
AMQ9259: Connection timed out from host 'HOST.A'.

EXPLANATION:
A connection from host 'HOST.A' over TCP/IP timed out.
ACTION:
Check to see why data was not received in the expected time. Correct the
problem. Reconnect the channel, or wait for a retrying channel to reconnect
itself.
----- amqccita.c : 3341 -------------------------------------------------------
08/04/12 08:44:35 - Process(462890.4658) User(mqad) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.

EXPLANATION:
Channel program 'CHANNEL.TO.C' ended abnormally.
ACTION:
Look at previous error messages for channel program 'CHANNEL.TO.C' in the
error files to determine the cause of the failure.
----- amqrmrsa.c : 468 --------------------------------------------------------

我正在试图了解导致这种情况的原因是什么?如果队列管理器A超载了多少连接,是否会导致这种情况?我没有在qmgr日志中看到任何TCP / IP错误代码。

1 个答案:

答案 0 :(得分:4)

看起来您正在运行MQ之前的V7.1版本?在MQ V7.1中,错误消息已从以下位置更新: -

AMQ9259: Connection timed out from host 'HOST.A'.

EXPLANATION:
A connection from host 'HOST.A' over TCP/IP timed out.
ACTION:
Check to see why data was not received in the expected time. Correct the
problem. Reconnect the channel, or wait for a retrying channel to reconnect
itself.

AMQ9259: Connection timed out from host 'HOST.A'.

EXPLANATION:
A connection from host 'HOST.A' over TCP/IP timed out.
ACTION:
The select() [TIMEOUT] 60 seconds call timed out. Check to see why data was
not received in the expected time. Correct the problem. Reconnect the channel,
or wait for a retrying channel to reconnect itself.

作为一个例子。 AMQ9259错误消息的最可能原因是您的接收超时设置已导致通道从其接收中弹出并关闭通道。建议您查看qm.ini文件中的接收超时设置,看看它们是否设置为短于心跳间隔的内容。

频道会自动重新启动,因为您在其上定义了重试间隔。这很好!