待机NN RPC延迟问题

时间:2015-09-27 19:10:56

标签: hadoop hadoop2

我目前正面临一个问题,我不时会在我的辅助NameNode上看到RPC延迟问题。该实例的日志事件如下所示:

The health test result for NAME_NODE_RPC_LATENCY has become bad: The moving average of the RPC latency is 6 second(s) over the previous 5 minute(s). The moving average of the queue time is 0 second(s). The moving average of the processing time is 6 second(s). Critical threshold: 5 second(s). 
Time: Sep 25, 2015 5:52:02 AM 

我们不时会看到这些RPC错误。我查看了日志,看不出有什么不同。

我在问题发生时检查了日志,发现没什么不寻常的

Call#0 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby

这通常是因为客户端不知道他连接到哪里,并且由于该节点处于待机状态,因此它无缝连接到活动NN。

我检查了RPC avg队列的时间和处理时间,有一次我看到连接中有一个突发,我们得到了一个警报,但是另一次当它变坏时,请求没有突发。

有什么建议吗?还有什么我可以检查的吗?

1 个答案:

答案 0 :(得分:0)

Call#0 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby is due to a BUG.

https://issues.apache.org/jira/browse/AMBARI-13373
on similar lines. 
If you have HA enabled:
Datanodes are trying to connect to standby namenode because you might have the standby namenode in your dfs.namenode.rpc-address check hdfs-site.xml.

Workaround: Remove this property because dfs.namenode.rpc-address.DEMOMASTER.nn1 and dfs.namenode.rpc-address.DEMOMASTER.nn1 will serve the purpose of dfs.namenode.rpc-address 

How to remove?

Use the  configs.sh utility on the Ambari Server to delete the extra property.

/var/lib/ambari-server/resources/scripts/configs.sh -u
<admin.user> -p
<admin.password> delete
<ambari.server>
<cluster.name> hdfs-site “dfs.namenode.rpc-address”

Where
admin.user and
admin.password are credentials for an Ambari Administrator,
ambari.server is the Ambari Server host and
cluster.name is the name of your cluster.