Question

我的HDFS群集中有3台机器：pr2是元节点，pr1和pr3是其他2个节点。我想用ganglia web来监控它们。为此，Ganglia（同时包含gmetad和gmond）已安装在所有3个节点上，ganglia web已安装在元节点pr2上}。

我正在运行以下内容：

在pr2：

service gmetad start
service gmond start
service httpd start

在pr1和pr3上：

service gmond start

我希望在ganglia web中看到有关所有3个节点的信息。但这就是我所看到的：

似乎只有1个节点正在运行，即pr2。

可能是什么原因？

配置设置时，我已探索了以下可能性：

在gmond配置文件中，有一个指定“主机”的地方。那里的“主人”应该是什么？例如，对于节点pr1，它应该是本地计算机（pr1）还是元节点（pr2）？我分别尝试了pr1和pr2，但仍然无法在pr1上看到pr3或ganglia web的信息。另外，我还需要在本节中添加更多内容吗？

[root@pr1]# nano /usr/local/etc/gmond.conf

/* The host section describes attributes of the host, like the location */
host {
  location = "unspecified" ### Should I replace "unspecified" with pr2 (the meta node), or with pr1? ###
}

此外，在同一配置文件中，还有一个用于配置udp_send_channels的部分。有一个参数mcast_join，我认为是指定元节点。所以我将IP更改为pr2。但由于ganglia web未显示pr1和pr3的信息，我尝试将其分别更改为pr1和pr3，但无济于事。

[root@pr1]# nano /usr/local/etc/gmond.conf

udp_send_channel {
  #bind_hostname = yes 
  mcast_join = 239.2.11.71 ### Should I change this IP to pr2 or pr1? ###
  port = 8649
  ttl = 1
}

同样，在同一个文件中，有一个用于配置udp_recv_channel的部分。我在这里注释了mcast_join和bind，因为我认为pr1和pr3节点不需要从任何地方接收任何信息，他们只需要发送信息到元节点pr2。我错了吗？

[root@pr1]# nano /usr/local/etc/gmond.conf

udp_recv_channel {
  #mcast_join = 239.2.11.71
  port = 8649
  #bind = 239.2.11.71
  retry_bind = true
  # Size of the UDP buffer. If you are handling lots of metrics you really
  # should bump it up to e.g. 10MB or even higher.
  # buffer = 10485760
}

然后我运行以下烟雾测试：

gmond -d 5 -c /usr/local/etc/gmond.conf

并返回以下内容（显示长输出的尾部）：

...
...
Counting device /dev/mapper/centos-root (2.69 %)
Counting device /dev/mapper/centos-home (0.02 %)
Counting device /dev/sda1 (18.49 %)
For all disks: 206.174 GB total, 204.498 GB free for users.
    metric 'part_max_used' has value_threshold 1.000000
    sent message 'heartbeat' of length 48 with 0 errors
Processing a metric value message from pr1.embs
Got a heartbeat message 1511171541

这意味着gmond工作正常，我是对的吗？

然后我配置gmond init文件（用于调用命令），并进行以下2次更改：（i）将GMOND指向正确的路径，以及（ii）将守护进程指向gmond.conf文件的正确路径。我可能在这里做错了吗？

[root @ pr1] #nano /etc/init.d/gmond

### Change here ###
#GMOND=/usr/sbin/gmond
GMOND=/usr/local/sbin/gmond

# TMPDIR set to SysV IPC ramdrive to avoid include processing failures
TMPDIR=/dev/shm
export TMPDIR

. /etc/rc.d/init.d/functions

start() {
    [ -x $GMOND ] || exit 5
    [ -f /etc/ganglia/gmond.conf ] || exit 6
    echo -n "Starting GANGLIA gmond: "

    ### Change here ###
    #daemon $GMOND
    daemon $GMOND -c /usr/local/etc/gmond.conf

然后我修改gmetad.conf文件以更改datasource并添加群集名称和用户名。在这里，我应该将数据源更改为本地计算机（pr1或pr3），还是更改为元节点（pr2）？

[root@pr1]# nano /usr/local/etc/gmetad.conf

===(change)
data_source "edhcluster" pr1 ### Should I change this to pr2, the meta node?
===

===(add)
gridname "edhcluster"
===

===(add)
setuid_username "ganglia"
===

或者，一个更大的问题，我不应该更改节点pr1和pr3的此文件中的任何内容（因为只有pr2是将监视所有内容的元节点），而是更改元节点gmetad.conf的{{1}}文件，仅包含在pr2所有3个节点中，如下所示：

datasource

[root@pr**2**]# nano /usr/local/etc/gmetad.conf

但是，之后，我在浏览解决这个问题的同时读到了某个地方，写这样的节点名称意味着它首先检查pr1，如果它关闭它将检查pr2，如果那也是下来，它会检查pr3。因此，如果我们想让它从所有3台机器收集数据，我们应该为每个节点编写单独的行，如下所示：

data_source "edhcluster" pr1 pr2 pr3

[root@pr2]# nano /usr/local/etc/gmetad.conf

我也尝试了这一点，然后重新启动了data_source "edhcluster" pr1 data_source "edhcluster" pr2 data_source "edhcluster" pr3中所有节点（gmetad，gmond和httpd以及pr2中的所有服务在gmond和pr1），但神经网络仍然没有为其他2个节点显示任何内容。

然后我在pr3上修改了gmetad.init文件（用于调用命令），使pr1守护程序指向GMETAD文件的正确路径。但是，我怀疑这可能是错误的，因为gmetad.conf的{{1}}应该处理这个问题。那么，我认为这是错误的吗？如果是，如何使gmetad守护程序指向另一个节点中的配置文件的路径，元节点gmetad？

pr2

pr2

感谢您查看配置步骤的所有详细信息。如果我错过或忽略了任何其他可能的原因，请随时指出它们。

Ganglia Web不显示其他节点的状态，仅显示元节点。可能的原因？

0 个答案: