Arangodb在dev-xvdb超时后停止并且不会重新启动

时间:2017-04-04 15:39:09

标签: arangodb

我在AWS C4实例上安装了arangodb 3.1.16。我有一个Foxx服务试图在生产中运行。它平均每秒获得10个200个八位字节的数据包,并返回每秒200个八位字节的20个数据包。

每次我开始运行我的流程时,foxx服务以一致的性能运行一小时,然后突然停止。我再也无法访问我的foxx api:所有请求都会出现连接超时错误,并且不会在foxx日志上打印。我再也无法访问Web界面了:页面无法加载。

大约一分钟后,foxx日志会显示一条错误消息:'ArangoError 18:lock timeout'

再过一分钟,日志会显示通常很快但需要很长时间的请求(警告{查询}慢查询:拍摄:1770.862498)

使用“journalctl -xe”,我了解到在外国IP尝试连接后,我得到了=“Job dev-xvdb.device / start timed out”

我设法使用:

重启arango
ps -eaf |grep arangod
sudo kill #
sudo apt-get --reinstall install arangodb3=3.1.16

如何解决这个反复出现的问题?

“journalctl -xe”给了我:

Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Failed with result 'exit-code’.
-- Subject: Unit arangodb3.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit arangodb3.service has begun starting up.
Apr 04 15:03:10 my-ip arangodb3[11481]:  * Starting arango database server arangod
Apr 04 15:03:10 my-ip arangodb3[11481]:  * database version check failed, maybe you need to run 'upgrade'?
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Control process exited, code=exited status=1
Apr 04 15:03:10 my-ip systemd[1]: Failed to start LSB: arangodb.
-- Subject: Unit arangodb3.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit arangodb3.service has failed.
-- 
-- The result is failed.
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Unit entered failed state.
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Failed with result 'exit-code'.
Apr 04 15:03:10 my-ip sudo[11346]: pam_unix(sudo:session): session closed for user root
Apr 04 15:03:17 my-ip sshd[11502]: Did not receive identification string from UNKNOWN IP 1
Apr 04 15:03:21 my-ip sshd[11503]: Connection closed by UNKNOWN IP 2 port 54736 [preauth]
Apr 04 15:03:21 my-ip sshd[11507]: Did not receive identification string from UNKNOWN IP 2
Apr 04 15:03:21 my-ip sshd[11506]: fatal: Unable to negotiate with UNKNOWN IP 2 port 54730: no matching host key type found. Their offer: ssh-dss [preauth]
Apr 04 15:03:21 my-ip sshd[11504]: Connection closed by UNKNOWN IP 2 port 54732 [preauth]
Apr 04 15:03:22 my-ip sshd[11505]: Connection closed by UNKNOWN IP 2 port 54734 [preauth]
Apr 04 15:03:40 my-ip systemd[1]: dev-xvdb.device: Job dev-xvdb.device/start timed out.
Apr 04 15:03:40 my-ip systemd[1]: Timed out waiting for device dev-xvdb.device.
-- Subject: Unit dev-xvdb.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit dev-xvdb.device has failed.
-- 
-- The result is timeout.
Apr 04 15:03:40 my-ip systemd[1]: Dependency failed for File System Check on /dev/xvdb.
-- Subject: Unit systemd-fsck@dev-xvdb.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit systemd-fsck@dev-xvdb.service has failed.
-- 
-- The result is dependency.
Apr 04 15:03:40 my-ip systemd[1]: Dependency failed for /mnt.
-- Subject: Unit mnt.mount has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit mnt.mount has failed.
-- 
-- The result is dependency.
Apr 04 15:03:40 my-ip systemd[1]: mnt.mount: Job mnt.mount/start failed with result 'dependency'.
Apr 04 15:03:40 my-ip systemd[1]: systemd-fsck@dev-xvdb.service: Job systemd-fsck@dev-xvdb.service/start failed with result 'dependency'.
Apr 04 15:03:40 my-ip systemd[1]: dev-xvdb.device: Job dev-xvdb.device/start failed with result 'timeout'.

我试过了:

sudo curl --dump - -X GET http://127.0.0.1:8529/_api/version && echo

它给了我:

HTTP/1.1 401 Unauthorized
 Www-Authenticate: Bearer token_type="JWT", realm="ArangoDB"
Server: ArangoDB
Connection: Keep-Alive
Content-Type: text/plain; charset=utf-8
Content-Length: 0

我试过了:

ps auxw | fgrep arangod

它给了我:

root     10439  0.0  0.1  82772  8664 ?        Ss   10:09   0:00 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor
arangodb 10440  5.7 94.5 12901776 7242340 ?    Sl   10:09  16:36 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor
ubuntu   11339  0.0  0.0  12916  1000 pts/0    R+   14:59   0:00 grep -F --color=auto arangod

arangod restart让我:

2017-04-04T15:01:16Z [11344] INFO ArangoDB 3.1.16 [linux] 64bit, using VPack 0.1.30, ICU 54.1, V8 5.0.71.39, OpenSSL 1.0.2g  1 Mar 2016
2017-04-04T15:01:16Z [11344] INFO using SSL options: SSL_OP_CIPHER_SERVER_PREFERENCE, SSL_OP_TLS_ROLLBACK_BUG
2017-04-04T15:01:16Z [11344] FATAL could not open shutdown file '/var/log/arangodb3/restart/SHUTDOWN': internal error

'service arangodb3 restart'给了我(经过短暂的等待时间):

Job for arangodb3.service failed because the control process exited with error code. See "systemctl status arangodb3.service" and "journalctl -xe" for details.

'systemctl status arangodb3.service'给了我:

 arangodb3.service - LSB: arangodb
Loaded: loaded (/etc/init.d/arangodb3; bad; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2017-04-04 15:03:10 UTC; 34s ago
Docs: man:systemd-sysv-generator(8)
Process: 11352 ExecStop=/etc/init.d/arangodb3 stop (code=exited, status=0/SUCCESS)
Process: 11481 ExecStart=/etc/init.d/arangodb3 start (code=exited, status=1/FAILURE)

Tasks: 83

Memory: 6.5G

 CPU: 73ms
CGroup: /system.slice/arangodb3.service
├─10439 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor
└─10440 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor
Apr 04 15:03:10 my-ip systemd[1]: Starting LSB: arangodb...
Apr 04 15:03:10 my-ip arangodb3[11481]:  * Starting arango database server arangod
Apr 04 15:03:10 my-ip arangodb3[11481]:  * database version check failed, maybe you need to run 'upgrade'?
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Control process exited, code=exited status=1
Apr 04 15:03:10 my-ip systemd[1]: Failed to start LSB: arangodb.
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Unit entered failed state.

1 个答案:

答案 0 :(得分:1)

从您的日志输出中,安装的磁盘卷似乎消失了。

如果存储在任何类型的数据库中消失,则没有合理的方法可以继续工作。

因此,您看到的效果是ArangoDB不再能够处理其数​​据 - 从它的角度来看它根本就不存在了。

其他人观察到的一个影响是AWS上的I / O积分枯竭,这也可能是您在上面看到的原因。

https://aws.amazon.com/blogs/aws/new-burst-balance-metric-for-ec2s-general-purpose-ssd-gp2-volumes/

如果我说得对,如果选择更大的音量,你可以获得更多的积分。如果这没有帮助,您需要降低测试方案,或者选择不对I / O操作有限制的其他主机方法。