我目前正面临一个问题,我不时会在我的辅助NameNode上看到RPC延迟问题。该实例的日志事件如下所示:
The health test result for NAME_NODE_RPC_LATENCY has become bad: The moving average of the RPC latency is 6 second(s) over the previous 5 minute(s). The moving average of the queue time is 0 second(s). The moving average of the processing time is 6 second(s). Critical threshold: 5 second(s).
Time: Sep 25, 2015 5:52:02 AM
我们不时会看到这些RPC错误。我查看了日志,看不出有什么不同。
我在问题发生时检查了日志,发现没什么不寻常的
Call#0 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby
这通常是因为客户端不知道他连接到哪里,并且由于该节点处于待机状态,因此它无缝连接到活动NN。
我检查了RPC avg队列的时间和处理时间,有一次我看到连接中有一个突发,我们得到了一个警报,但是另一次当它变坏时,请求没有突发。
有什么建议吗?还有什么我可以检查的吗?
答案 0 :(得分:0)
Call#0 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby is due to a BUG.
https://issues.apache.org/jira/browse/AMBARI-13373
on similar lines.
If you have HA enabled:
Datanodes are trying to connect to standby namenode because you might have the standby namenode in your dfs.namenode.rpc-address check hdfs-site.xml.
Workaround: Remove this property because dfs.namenode.rpc-address.DEMOMASTER.nn1 and dfs.namenode.rpc-address.DEMOMASTER.nn1 will serve the purpose of dfs.namenode.rpc-address
How to remove?
Use the configs.sh utility on the Ambari Server to delete the extra property.
/var/lib/ambari-server/resources/scripts/configs.sh -u
<admin.user> -p
<admin.password> delete
<ambari.server>
<cluster.name> hdfs-site “dfs.namenode.rpc-address”
Where
admin.user and
admin.password are credentials for an Ambari Administrator,
ambari.server is the Ambari Server host and
cluster.name is the name of your cluster.