I have started seeing this issue in the last couple of days. Ganglia gemtad process gets terminated within 5 min of its start with SIGSEGV (segfault)
This was stable since last few months..so not sure what changed.
Version - gmetad 3.7.1
I don't see any core dump or anything specific to gmetad in /var/log/messages or /var/log/secure either.
System snap (from top) at the time of this event
load average: 1.97, 0.99, 0.42
Memory also looks fairly Ok
free -m
total used free shared buffers cached
Mem: 7989 3624 4364 0 333 2562
-/+ buffers/cache: 728 7260
Swap: 4095 0 4095
I have a superviord process that forks & watches the gmetad -
here is the supervisor log
2016-10-20 14:34:55,707 INFO exited: gmetad (terminated by SIGSEGV; not expected)
2016-10-20 14:34:55,707 INFO received SIGCLD indicating a child quit
2016-10-20 14:34:57,712 INFO spawned: 'gmetad' with pid 24561
2016-10-20 14:34:59,929 INFO exited: gmetad (terminated by SIGSEGV; not expected)
2016-10-20 14:34:59,929 INFO received SIGCLD indicating a child quit
2016-10-20 14:35:02,932 INFO spawned: 'gmetad' with pid 24593
2016-10-20 14:35:04,897 INFO exited: gmetad (terminated by SIGSEGV; not expected)
2016-10-20 14:35:04,897 INFO received SIGCLD indicating a child quit
2016-10-20 14:35:08,903 INFO spawned: 'gmetad' with pid 24618
2016-10-20 14:35:11,257 INFO exited: gmetad (terminated by SIGSEGV; not expected)
2016-10-20 14:35:11,257 INFO received SIGCLD indicating a child quit
2016-10-20 14:35:12,257 INFO gave up: gmetad entered FATAL state, too many start retries too quickly
Has anyone faced this kind of issue with gmetad in particular? Appreciate any pointers.
答案 0 :(得分:0)
我能够确定问题并解决。
一些关键步骤/发现 -
就我而言,要指出一个文件名 - ' part_max_used.rrd'是/ path / to / ganglia / rrds / node_name下的文件名是SIGSEGV的根本原因
希望这会有所帮助 - )