弹性beanstalk实例中的RPM包数据库损坏

时间:2014-02-25 22:00:34

标签: linux amazon rpm elastic-beanstalk

不确定这是否是发布此内容而不是serverfault的正确位置,但这里是。我有一个开发和prod环境设置在弹性beanstalk为Wordpress网站。大多数时候我可以毫无问题地使用git aws.push ...。但是,今天早上推送首先针对dev实例,然后针对prod实例。随后对任一环境的每次推送都以相同的方式失败。在/var/log/eb-tools.log中我注意到这样的错误:

2014-02-25 15:55:41,071 [INFO] (19684 MainThread) [command.py-122] [root command execute] Executing commands: ['Infra-WriteRuntimeConfig', 'Infra-WriteApplication1', 'Infra-WriteApplication2', 'Infra-EmbeddedPreBuild', 'Hook-PreAppDeploy', 'Infra-EmbeddedPostBuild', 'Hook-EnactAppDeploy', 'Hook-PostAppDeploy'] - AWSEBAutoScalingGroup
2014-02-25 15:55:55,367 [INFO] (19684 MainThread) [command.py-130] [root command execute] Command returned: (code: 1, stdout: Error occurred during build: Command 01_install_npm failed
, stderr: None)
2014-02-25 15:55:55,374 [DEBUG] (19684 MainThread) [commandWrapper.py-60] [root commandWrapper main] Command result: {'status': 'FAILURE', 'results': [{'status': 'FAILURE', 'config_sets': ['Infra-WriteRuntimeConfig', 'Infra-WriteApplication1', 'Infra-WriteApplication2', 'Infra-EmbeddedPreBuild', 'Hook-PreAppDeploy', 'Infra-EmbeddedPostBuild', 'Hook-EnactAppDeploy', 'Hook-PostAppDeploy'], 'returncode': 1, 'events': [], 'msg': 'Error occurred during build: Command 01_install_npm failed\n'}], 'api_version': '1.0'}

似乎npm安装失败,导致其余推送失败。

在cfn-init.log中我也发现了这样的错误:

2014-02-25 15:55:45,706 [DEBUG] No test for command 01_install_npm
2014-02-25 15:55:54,728 [ERROR] Command 01_install_npm (yum install -y --enablerepo=epel nodejs npm) failed
2014-02-25 15:55:54,753 [DEBUG] Command 01_install_npm output: Loaded plugins: priorities, update-motd, upgrade-helper
649 packages excluded due to repository priority protections
Package nodejs-0.10.25-1.el6.x86_64 already installed and latest version

2014-02-25 15:55:54,754 [ERROR] Error encountered during build of prebuild_0_Skeeter_Snacks: Command 01_install_npm failed
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/cfnbootstrap/construction.py", line 511, in run_config
    CloudFormationCarpenter(config, self._auth_config).build(worklog)
  File "/usr/lib/python2.6/site-packages/cfnbootstrap/construction.py", line 247, in build
    changes['commands'] = CommandTool().apply(self._config.commands)
  File "/usr/lib/python2.6/site-packages/cfnbootstrap/command_tool.py", line 113, in apply
    raise ToolError(u"Command %s failed" % name)
ToolError: Command 01_install_npm failed
2014-02-25 15:55:54,874 [ERROR] Unhandled exception during build: Command 01_install_npm failed
Traceback (most recent call last):
  File "/opt/aws/bin/cfn-init", line 122, in <module>
    worklog.build(detail.metadata, configSets)
  File "/usr/lib/python2.6/site-packages/cfnbootstrap/construction.py", line 117, in build
    Contractor(metadata).build(configSets, self)
  File "/usr/lib/python2.6/site-packages/cfnbootstrap/construction.py", line 502, in build
    self.run_config(config, worklog)
  File "/usr/lib/python2.6/site-packages/cfnbootstrap/construction.py", line 511, in run_config
    CloudFormationCarpenter(config, self._auth_config).build(worklog)
  File "/usr/lib/python2.6/site-packages/cfnbootstrap/construction.py", line 247, in build
    changes['commands'] = CommandTool().apply(self._config.commands)
  File "/usr/lib/python2.6/site-packages/cfnbootstrap/command_tool.py", line 113, in apply
    raise ToolError(u"Command %s failed" % name)
ToolError: Command 01_install_npm failed

所以npm安装失败,因为rpm命令本身失败了。我决定运行自己失败的命令,看看问题究竟是什么,最后得到了类似的东西(不是确切的错误,因为我再也不能重现了):

> sudo yum install -y --enablerepo=epel nodejs
npmerror: Fatal error, run database recovery error: cannot open Packages index using db5 - (-30973) error: cannot open Packages database in /var/lib/rpm CRITICAL:yum.main: Error: rpmdb open failed - See more at: http://linuxsysconfig.com/2013/03/recover-the-rpm-database-on-fedora-18/#sthash.8nFhvVak.dpuf

似乎实例的RPM包数据库已损坏(我没有找到锁文件,所以我认为这是一个腐败问题)。我在prod和dev上都收到了同样的错误。运行sudo rpm -v --rebuilddb修复了两个实例上的问题,并允许我再次成功推送。我没有遇到过这个问题。在故障之前成功推送的工件与在故障期间和故障之后也被推送的工件相同,因此我不完全相信这是任何代码或配置文件的问题。此外,用户,网络打嗝或其他异常日志条目之间没有竞争条件,这表明部署过程在依赖进程使用软件包数据库时被中断。此外,失败前这两个日志文件中的最后一个条目是正常成功条目。

我想了解这个问题的原因,所以我可以避免它,而不必像在.ebextensions文件夹中使用rebuild命令放置配置文件那样粗糙。以前有没有人经历过这样的事情?有没有人有任何关于在哪里寻找或考虑事项的指示,以防止再次发生这种情况?

感谢。

0 个答案:

没有答案