Spark Streaming - KafkaWordCount无法在Spark独立群集上运行

时间:2016-04-12 15:09:50

标签: apache-spark spark-streaming

我需要您在独立的火花群集上运行Apache Spark KafkaWordCount示例的帮助和建议:

我可以通过

在本地模式下运行Spark示例,KafkaWordCount
spark-submit .... --master local[4] ....

我可以从另一个节点(虚拟机)中的Kafka Server获取消息,并在终端控制台上打印结果。

,将应用程序提交到spark独立群集时(通过

spark-submit .... --master spark://master:7077 ....

),我在 $ SPARK_HOME / work /../../ stderr 目录的每个工作节点的目录中找到了例外。 并且每个字计数批次的结果是打印到每个工作节点中的$ SPARK_HOME / work /../ .. stdout。

以下是$ SPARK_HOME / conf / spark-env.sh中我的每个spark worker节点的资源设置:

export SPARK_MASTER_IP=master
export SPARK_WORKER_CORES=4
export SPARK_WORKER_MEMORY=3g
export SPARK_WORKER_INSTANCES=2

我有5个虚拟机节点(在这里的主机名中):mykafka,master,data1,data2和data3。

提前感谢您的任何帮助和建议。

以下是每个工作人员中 RpcTimeoutException 的例外情况:

16/04/11 23:07:30 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(5,[Lscala.Tuple2;@2628a359,BlockManagerId(5, data3, 34838))] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 10 seconds. This timeout is controlled by spark.executor.heartbeatInterval
  at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
  at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
  at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
  at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
  at scala.util.Try$.apply(Try.scala:161)
  at scala.util.Failure.recover(Try.scala:185)
  at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
  ....
  ....
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 10 seconds
  at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:242)
  ... 7 more
16/04/11 23:07:31 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
beat(5,[Lscala.Tuple2;@2628a359,BlockManagerId(5, data3, 34838))] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 10 seconds. This timeout is controlled by spark.executor.heartbeatInterval
  at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
  at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
  at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
  ....
  ....
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 10 seconds
  at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:242)
  ... 7 more

1 个答案:

答案 0 :(得分:3)

所以我在这个例子中遇到了完全相同的问题,它似乎与这个bug有关https://issues.apache.org/jira/browse/SPARK-13906

不确定如何为示例设置此项,但我一直在试验代码,构建一个小的scala应用程序并为SparkConf添加了一个额外的配置参数()

val conf = new SparkConf()
.setAppName('name')
.set("spark.rpc.netty.dispatcher.numThreads","2")

归功于David Gomez和火花邮件,经过多次google后我找到了解决方案

https://mail-archives.apache.org/mod_mbox/spark-user/201603.mbox/%3CCAAn_Wz1ik5YOYych92C85UNjKU28G+20s5y2AWgGrOBu-Uprdw@mail.gmail.com%3E