我已经设置好Raspbery PI 3+以运行Grafana(与InfluxDB和Telegraf一起使用)来收集家庭网络的网络统计信息。我从Sonic Wall,一个“智能管理的” HP交换机和两个Cisco交换机读取数据。有一些关于ping时间和数据包丢失的指标,并且该计算机还托管了我的Unifi接入点管理器。
这已经工作了大约6个月了。在过去的几天里,InfluxDB病了。尝试查询InfluxDB时,Grafana开始显示501错误。我重新启动了Pi,它又回来了……但是大约12小时后,我又陷入了501s的困境。
我看到InfluxDB固定了CPU。从来没有过高的CPU使用率,但是现在我一直在200%到250%之间。令人费解的是,(据我所知)没有理由改变数据库上的查询负载。
我认为,当我升级到InfluxDB 1.7.7时,情况会变得更糟,但是我不知道以前的版本是什么。此外,我很难收集来自InfluxDB的任何信息,因为它一开始就固定CPU使用率,并且主机变得无响应。
如何诊断InfluxDB的CPU使用率高?
这里htop
显示使用超过350%的CPU涌入:
-------------------------------------------------------------------------------
2019-07-07 13:25:02
-------------------------------------------------------------------------------
1 [|||||||||||||||||||||||| 25.5%] Tasks: 36, 147 thr; 6 running
2 [|||||||||||||||||||||||||||| 29.5%] Load average: 3.43 3.84 3.78
3 [|||||||||||||||||||||||| 25.6%] Uptime: 00:47:19
4 [|||||||||||||||||||||||||||||||||||||||||||||||||| 54.7%]
Mem[|||||||||||||||||||||||||||||||||| 136M/926M]
Swp[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||99.9M/100.0M]
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
4306 influxdb 20 0 1019M 48344 30068 R 121. 5.1 0:07.04 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
4310 influxdb 20 0 1019M 48344 30068 S 16.4 5.1 0:00.43 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
4309 influxdb 20 0 1019M 48344 30068 S 11.8 5.1 0:00.34 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
4311 influxdb 20 0 1019M 48344 30068 S 7.2 5.1 0:00.37 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
559 telegraf 20 0 832M 18420 7440 S 2.6 1.9 3:08.39 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
4270 pi 20 0 6372 3060 2072 R 2.6 0.3 0:01.06 htop
116 root 20 0 29168 3012 2780 S 2.6 0.3 0:41.88 /lib/systemd/systemd-journald
4307 influxdb 20 0 1019M 48344 30068 S 2.0 5.1 0:00.04 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
4312 influxdb 20 0 1019M 48344 30068 S 1.3 5.1 0:00.24 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
1066 telegraf 20 0 832M 18420 7440 R 1.3 1.9 0:09.25 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
1057 telegraf 20 0 832M 18420 7440 S 0.7 1.9 0:11.60 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
340 mongodb 20 0 232M 2492 1760 S 0.7 0.3 0:35.16 /usr/bin/mongod --config /etc/mongodb.conf
1234 telegraf 20 0 832M 18420 7440 S 0.7 1.9 0:07.61 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
1239 telegraf 20 0 832M 18420 7440 S 0.7 1.9 0:08.03 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
451 mongodb 20 0 232M 2492 1760 S 0.7 0.3 0:14.52 /usr/bin/mongod --config /etc/mongodb.conf
345 root 20 0 23756 1036 556 S 0.7 0.1 0:11.47 /usr/sbin/rsyslogd -n
381 root 20 0 23756 1036 556 S 0.7 0.1 0:05.25 /usr/sbin/rsyslogd -n
659 unifi 20 0 1112M 20080 1832 S 0.7 2.1 0:15.78 unifi -cwd /usr/lib/unifi -home /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/jre -cp /usr/share/java/commons-daemon.jar:/usr/lib/unifi/lib/ac
445 mongodb 20 0 232M 2492 1760 S 0.7 0.3 0:05.27 /usr/bin/mongod --config /etc/mongodb.conf
721 www-data 20 0 224M 384 332 S 0.7 0.0 0:01.90 /usr/sbin/apache2 -k start
684 www-data 20 0 224M 384 332 S 0.7 0.0 0:01.90 /usr/sbin/apache2 -k start
756 unifi 20 0 1112M 20080 1832 S 0.7 2.1 0:02.29 unifi -cwd /usr/lib/unifi -home /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/jre -cp /usr/share/java/commons-daemon.jar:/usr/lib/unifi/lib/ac
765 grafana 20 0 924M 13820 3420 S 0.7 1.5 0:00.45 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid cfg:default.paths.logs=/var/log/
671 telegraf 20 0 832M 18420 7440 S 0.0 1.9 0:11.24 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
3627 telegraf 20 0 832M 18420 7440 S 0.0 1.9 0:01.78 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
740 telegraf 20 0 832M 18420 7440 S 0.0 1.9 0:07.68 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
663 telegraf 20 0 832M 18420 7440 S 0.0 1.9 0:20.88 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
1081 telegraf 20 0 832M 18420 7440 S 0.0 1.9 0:14.85 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
1248 telegraf 20 0 832M 18420 7440 S 0.0 1.9 0:12.42 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
666 root 20 0 916M 5004 1464 S 0.0 0.5 0:00.35 /usr/bin/containerd
4181 grafana 20 0 924M 13820 3420 S 0.0 1.5 0:00.03 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid cfg:default.paths.logs=/var/log/
1241 telegraf 20 0 832M 18420 7440 S 0.0 1.9 0:07.99 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
667 root 20 0 916M 5004 1464 S 0.0 0.5 0:00.43 /usr/bin/containerd
F1Help F2Setup F3SearchF4FilterF5Tree F6SortByF7Nice -F8Nice +F9Kill F10Quit
-------------------------------------------------------------------------------
2019-07-07 13:25:02
-------------------------------------------------------------------------------
1 [|||||||||||||||||||||||||| 28.0%] Tasks: 36, 147 thr; 3 running
2 [|||||||||||||||||||||||||||||||||||| 39.5%] Load average: 3.57 3.85 3.79
3 [|||||||||||||||||||||||||||||||||||||||||||||||||| 53.9%] Uptime: 00:47:45
4 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 76.3%]
Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 310M/926M]
Swp[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||99.4M/100.0M]
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
4306 influxdb 20 0 1972M 314M 123M S 189. 34.0 1:08.90 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
4316 influxdb 20 0 1972M 314M 123M R 99.5 34.0 0:14.78 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
4313 influxdb 20 0 1972M 314M 123M S 35.6 34.0 0:09.87 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
4314 influxdb 20 0 1972M 314M 123M S 27.7 34.0 0:10.05 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
559 telegraf 20 0 832M 19016 7712 S 4.0 2.0 3:10.10 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
740 telegraf 20 0 832M 19016 7712 S 3.3 2.0 0:07.75 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
4270 pi 20 0 6372 3060 2072 R 2.0 0.3 0:01.62 htop
340 mongodb 20 0 232M 3192 2460 S 1.3 0.3 0:35.51 /usr/bin/mongod --config /etc/mongodb.conf
663 telegraf 20 0 832M 19016 7712 S 0.7 2.0 0:21.13 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
451 mongodb 20 0 232M 3192 2460 S 0.7 0.3 0:14.66 /usr/bin/mongod --config /etc/mongodb.conf
4307 influxdb 20 0 1972M 314M 123M S 0.7 34.0 0:00.20 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
445 mongodb 20 0 232M 3192 2460 S 0.7 0.3 0:05.32 /usr/bin/mongod --config /etc/mongodb.conf
1248 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:12.55 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
1250 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:12.64 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
664 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:09.70 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
1241 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:08.22 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
542 root 20 0 929M 7600 1052 S 0.0 0.8 0:04.49 /usr/bin/dockerd -H unix://
3131 pi 20 0 11664 920 644 S 0.0 0.1 0:00.30 sshd: pi@pts/0
764 unifi 20 0 1112M 20212 1832 R 0.0 2.1 0:04.79 unifi -cwd /usr/lib/unifi -home /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/jre -cp /usr/share/java/commons-daemon.jar:/usr/lib/unifi/lib/ac
2910 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:04.93 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
1057 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:11.69 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
1234 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:07.79 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
1236 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:13.93 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
671 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:11.35 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
1066 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:09.36 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
116 root 20 0 29168 3012 2780 S 0.0 0.3 0:42.06 /lib/systemd/systemd-journald
1239 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:08.07 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
3627 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:01.80 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
676 root 20 0 929M 7600 1052 S 0.0 0.8 0:00.47 /usr/bin/dockerd -H unix://
659 unifi 20 0 1112M 20212 1832 S 0.0 2.1 0:15.84 unifi -cwd /usr/lib/unifi -home /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/jre -cp /usr/share/java/commons-daemon.jar:/usr/lib/unifi/lib/ac
1081 telegraf 20 0 832M 19016 7712 S 0.0 2.0 0:14.87 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
345 root 20 0 23756 1036 556 S 0.0 0.1 0:11.52 /usr/sbin/rsyslogd -n
543 grafana 20 0 924M 13820 3420 S 0.0 1.5 0:06.82 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid cfg:default.paths.logs=/var/log/
在这种状态下,我什至无法运行Influx CLI:
$ influx
Failed to connect to http://localhost:8086: Get http://localhost:8086/ping: dial tcp [::1]:8086: connect: connection refused
Please check your connection settings and ensure 'influxd' is running.
我发现influxdb现在使用日记记录,因此日志由sudo journalctl -u influxdb.service
给出。我已经用到目前为止的发现更新了这个问题。
事实证明influxdb不写日志文件;它使用日志记录。
转储日志表明服务正在快速启动,开始进行一些压缩,然后耗尽内存。发生这种情况时,它将关闭...然后重新启动。
Jul 14 02:31:43 twang influxd[4139]: ts=2019-07-14T01:31:43.096464Z lvl=info msg="Compacting file" log_id=0GcXWe5l000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcXZGU0000 op_name=ts
Jul 14 02:31:43 twang influxd[4139]: ts=2019-07-14T01:31:43.096497Z lvl=info msg="Compacting file" log_id=0GcXWe5l000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcXZGU0000 op_name=ts
Jul 14 02:31:43 twang influxd[4139]: ts=2019-07-14T01:31:43.096198Z lvl=info msg="TSM compaction (start)" log_id=0GcXWe5l000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcXZGU0001 op_
Jul 14 02:31:43 twang influxd[4139]: ts=2019-07-14T01:31:43.097520Z lvl=info msg="Beginning compaction" log_id=0GcXWe5l000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcXZGU0001 op_na
Jul 14 02:31:43 twang influxd[4139]: ts=2019-07-14T01:31:43.097611Z lvl=info msg="Compacting file" log_id=0GcXWe5l000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcXZGU0001 op_name=ts
Jul 14 02:31:43 twang influxd[4139]: ts=2019-07-14T01:31:43.097652Z lvl=info msg="Compacting file" log_id=0GcXWe5l000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcXZGU0001 op_name=ts
Jul 14 02:31:43 twang influxd[4139]: ts=2019-07-14T01:31:43.097691Z lvl=info msg="Compacting file" log_id=0GcXWe5l000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcXZGU0001 op_name=ts
Jul 14 02:31:43 twang influxd[4139]: ts=2019-07-14T01:31:43.097726Z lvl=info msg="Compacting file" log_id=0GcXWe5l000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcXZGU0001 op_name=ts
:
:
:
Jul 14 01:55:08 twang influxd[1756]: ts=2019-07-14T00:55:08.256884Z lvl=info msg="TSM compaction (start)" log_id=0GcVQfaG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcVTIt0000 op_
Jul 14 01:55:08 twang influxd[1756]: ts=2019-07-14T00:55:08.288481Z lvl=info msg="Beginning compaction" log_id=0GcVQfaG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcVTIt0000 op_na
Jul 14 01:55:08 twang influxd[1756]: ts=2019-07-14T00:55:08.290445Z lvl=info msg="Compacting file" log_id=0GcVQfaG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcVTIt0000 op_name=ts
Jul 14 01:55:08 twang influxd[1756]: ts=2019-07-14T00:55:08.292220Z lvl=info msg="Compacting file" log_id=0GcVQfaG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcVTIt0000 op_name=ts
Jul 14 01:55:08 twang influxd[1756]: ts=2019-07-14T00:55:08.293889Z lvl=info msg="Compacting file" log_id=0GcVQfaG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcVTIt0000 op_name=ts
Jul 14 01:55:08 twang influxd[1756]: ts=2019-07-14T00:55:08.295738Z lvl=info msg="Compacting file" log_id=0GcVQfaG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcVTIt0000 op_name=ts
Jul 14 01:55:08 twang influxd[1756]: ts=2019-07-14T00:55:08.297635Z lvl=info msg="Compacting file" log_id=0GcVQfaG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0GcVTIt0000 op_name=ts
Jul 14 01:55:11 twang influxd[1756]: [httpd] ::1 - username [14/Jul/2019:01:55:10 +0100] "POST /write?consistency=any&db=telegraf HTTP/1.1" 204 0 "-" "telegraf" 07902d7a-a5d2-11e9-8001-b827eb6b4e27 11
Jul 14 01:55:11 twang influxd[1756]: [httpd] ::1 - - [14/Jul/2019:01:55:10 +0100] "POST /write?db=telegraf HTTP/1.1" 204 0 "-" "Telegraf/1.11.1" 079a21ac-a5d2-11e9-8002-b827eb6b4e27 1683504
Jul 14 01:55:12 twang influxd[1756]: [httpd] ::1 - username [14/Jul/2019:01:55:11 +0100] "POST /write?consistency=any&db=telegraf HTTP/1.1" 204 0 "-" "telegraf" 08451343-a5d2-11e9-8003-b827eb6b4e27 17
Jul 14 01:55:12 twang influxd[1756]: [httpd] ::1 - - [14/Jul/2019:01:55:11 +0100] "POST /write?db=telegraf HTTP/1.1" 204 0 "-" "Telegraf/1.11.1" 089bdbca-a5d2-11e9-8004-b827eb6b4e27 1182542
Jul 14 01:55:17 twang influxd[1756]: runtime: out of memory: cannot allocate 8192-byte block (540016640 in use)
Jul 14 01:55:17 twang influxd[1756]: fatal error: out of memory
Jul 14 01:55:17 twang influxd[1756]: runtime: out of memory: cannot allocate 8192-byte block (540016640 in use)
Jul 14 01:55:17 twang influxd[1756]: fatal error: out of memory
现在,我必须弄清楚如何摆脱困境。有任何猜测吗?
答案 0 :(得分:0)
出于某种奇怪的原因,我的解决方案是停止服务,用手开始涌入,让它运行一会儿,然后CPU负载很高。 :D