Spark错误:无法将RPC发送到Datanode

时间:2018-02-07 16:20:50

标签: hadoop apache-spark hive spark-streaming ambari

我们在Spark thrift服务器上遇到的问题很少

从日志中我们可以看到:无法将RPC 9053901149358924945发送到/ DATA NODE MACHINE:50149

请告知为什么会发生这种情况,解决方法是什么?

Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149: java.nio.channels.ClosedChannelException
more spark-hive-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-master03.sys67.com.out


Spark Command: /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version=2.6.0.3-8 -cp /usr/hdp/current/spark2-thriftserver/conf/:/usr/hdp/current/spark2-thriftserver/jars/*:/usr/hdp/c
urrent/hadoop-client/conf/ -Xmx10000m org.apache.spark.deploy.SparkSubmit --conf spark.driver.memory=15g --properties-file /usr/hdp/current/spark2-thriftserver/conf/spark-th
rift-sparkconf.conf --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --executor-cores 7 spark-internal
========================================
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
18/02/07 17:55:21 ERROR TransportClient: Failed to send RPC 9053901149358924945 to /12.87.2.64:50149: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/02/07 17:55:21 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(2,0,Map()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 9053901149358924945 to /12.87.2.64:50149: java.nio.channels.ClosedChannelException
        at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:249)
        at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:233)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:514)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:488)
        at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
        at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:438)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/02/07 17:55:21 ERROR SparkContext: Error initializing SparkContext.

我们也尝试从此链接中获取一些优点 - https://thebipalace.com/2017/08/23/spark-error-failed-to-send-rpc-to-datanode/

但这是一个新的ambari群集,我们不认为这篇文章适合这个特定的问题(现在我们的ambari群集上没有正在运行的火花作业)

2 个答案:

答案 0 :(得分:1)

就我而言,我将驱动程序和执行程序的内存从8G减少到了4G:

spark.driver.memory=4G,
spark.executor.memory=4G

检查您的节点配置,您不应要求更多的可用内存。

答案 1 :(得分:0)

这可能是由于磁盘空间不足所致。就我而言,我在具有1个r4.2xlarge(主)和2个r4.8xlarge(核心)的AWS EMR中运行Spark Job。火花调整和增加从属节点解决了我的问题。最常见的问题是内存压力,错误配置的bcoz(即执行器大小错误),长时间运行的任务以及导致笛卡尔运算的任务。您可以通过适当的缓存并允许数据倾斜来加快作业速度。为了获得最佳性能,请监视和检查长时间运行且消耗资源的Spark作业执行。希望对您有所帮助。

参考=> EMR Spark - TransportClient: Failed to send RPC