Question

我们有一个使用WebSphere MQ 7.0.1.3的应用程序。在舞台环境中进行大量测试时，磁盘已满。

此后，MQ挂起了。我们删除了应用程序日志（与MQ无关）并添加了更多磁盘，但它没有解决问题。

我们尝试重启队列管理器：

$ endmqlsr
$ endmqm XYZ
$ strmqm XYZ
WebSphere MQ queue manager 'XYZ' starting.
WebSphere MQ was unable to display an error message 893.

磁盘已满并发生错误时的日志：

----- amqxfdcx.c : 828 --------------------------------------------------------
06/08/2018 03:36:44 AM - Process(8832.5) User(mqm) Program(amqzlaa0)
AMQ6119: An internal WebSphere MQ error has occurred (Rc=28 from write)
----- amqxfdcx.c : 783 --------------------------------------------------------
06/08/2018 03:36:44 AM - Process(8832.5) User(mqm) Program(amqzlaa0)
AMQ6184: An internal WebSphere MQ error has occurred on queue manager XYZ.
----- amqxfdcx.c : 822 --------------------------------------------------------
06/08/2018 03:36:46 AM - Process(8832.5) User(mqm) Program(amqzlaa0)
AMQ6119: An internal WebSphere MQ error has occurred (Rc=28 from write)
----- amqxfdcx.c : 783 --------------------------------------------------------
06/08/2018 03:36:46 AM - Process(8832.5) User(mqm) Program(amqzlaa0)
AMQ6184: An internal WebSphere MQ error has occurred on queue manager XYZ.
AMQ6119: An internal WebSphere MQ error has occurred ('28 - No space left on device' from semget.)
----- amqxfdcx.c : 783 --------------------------------------------------------
06/14/2018 02:35:46 PM - Process(6794.1) User(mqm) Program(amqzxma0)
AMQ6184: An internal WebSphere MQ error has occurred on queue manager XYZ.
----- amqxfdcx.c : 822 --------------------------------------------------------
06/14/2018 02:35:46 PM - Process(6794.1) User(mqm) Program(amqzxma0)
AMQ6118: An internal WebSphere MQ error has occurred (20006037)

尝试连接IBM WebSphere MQ Explorer时

Queue manager not available for connection - reason 2059. (AMQ4043)
Severity: 20 (Error)
Explanation: The attempt to connect to the queue manager failed. This could be because the queue manager is incorrectly configured to allow a connection from this system, or the connection has been broken.
Response: Ensure that the queue manager is running. If the queue manager is running on another computer, ensure it is configured to accept remote connections.

是否有办法清除队列中的所有消息并重置所有标志，以便队列管理器启动并且队列将再次起作用？

队列中只有旧的测试数据，没有任何价值。

或者您对如何解决这个问题有任何其他建议吗？

Answer 1

您可以使用mqrc命令提供有关错误的更多信息。大多数情况下，MQ报告将代码返回为四位十进制数。在这种情况下，由于返回码是三位数，它通常（总是？）表示它是HEX返回码。

$ mqrc 2195

      2195  0x00000893  MQRC_UNEXPECTED_ERROR

当MQ遇到不期望的错误情况时，抛出此错误。通常，您会在/var/mqm/errors目录中找到可以提供更多详细信息的FDC文件。

当您收到此类错误时，最好的做法是与IBM一起打开PMR并让他们提供恢复方向，以确保您最有可能保留队列中可能存在的消息，但是使用自2015年9月30日以来一直缺乏支持的MQ（7.0）版本。您所使用的特定修订包（7.0.1.3）已于2010年8月发布。来自IBM的v7.0的最新版本为7.0.1.14 2016年8月。

如果您向IBM支付延期支持费用，您可以与他们一起打开PMR以获得更多支持。

解决问题后，最佳路径是迁移到受支持的IBM MQ版本。目前，v8.0和v9.0是目前唯一受支持的IBM MQ版本。

假设您没有扩展支持但无法获得IBM的帮助，以下是一些建议的步骤：

更新到最新的Fix Pack（7.0.1.14）可能会有所帮助，如果它无法解决问题，那么最好使用不受支持的IBM MQ版本的最新修订包。
您可以尝试冷启动队列管理器，看看是否有帮助。这是从演示文稿"WebSphere MQ Disaster Recovery" given by Mark Taylor at Capitalware's MQ Technical Conference v2.0.1.3的第4页开始记录的。

创建一个完全类似于失败的队列管理器
使用qm.ini计算crtmqm命令的参数
Log:
  LogPrimaryFiles=10
  LogSecondaryFiles=10
  LogFilePages=65535
  LogType=CIRCULAR
发出crtmqm命令


crtmqm -lc -lf 65535 -lp 10 -ls 10 –ld /tmp/mqlogs TEMP.QMGR

确保该目录中的新日志文件有足够的空间




虚拟队列管理器的名称无关紧要


只关心获取日志文件




不要启动此虚拟队列管理器，只需创建它
将旧日志和amqhlctl.lfh替换为新日志
cd /var/mqm/log
mv QM1 QM1.SAVE
mv /tmp/mqlogs/TEMP!QMGR QM1
请注意“损坏的”目录名称......这是正常的
如果消息是持久性的，则会保留队列中的数据



还保留了对象定义


对象在其文件中包含自己的定义

QMQMOBJCAT
中保存的文件和对象名称之间的映射

完成以上所有操作后，请尝试启动队列管理器。

磁盘已经完全使MQ死了

1 个答案: