我在运行从Kafka读取的Spark Streaming作业时遇到了一个奇怪的问题。我在CDH 5.8.3发行版上:Spark版本是1.6.0,Kafka版本是0.9.0。
我的代码非常简单:
val kafkaParams = Map[String, String]("bootstrap.servers" -> brokersList, "auto.offset.reset" -> "smallest")
KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, Set(kafkaTopic))
如果我在 yarn-client 模式下运行它,我没有错误。如果我在 yarn-cluster 模式下运行程序,我会收到异常。我的启动命令是:
spark-submit --master yarn-cluster --files /etc/hbase/conf/hbase-site.xml --num-executors 5 --executor-memory 4G --jars (somejars for HBase interaction) --class mypackage.MyClass myJar.jar
但是我收到了这个错误:
java.lang.ClassCastException: kafka.cluster.Broker cannot be cast to kafka.cluster.BrokerEndPoint
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6$$anonfun$apply$7.apply(KafkaCluster.scala:90)
at scala.Option.map(Option.scala:145)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6.apply(KafkaCluster.scala:90)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6.apply(KafkaCluster.scala:87)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3.apply(KafkaCluster.scala:87)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3.apply(KafkaCluster.scala:86)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2.apply(KafkaCluster.scala:86)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2.apply(KafkaCluster.scala:85)
at scala.util.Either$RightProjection.flatMap(Either.scala:523)
at org.apache.spark.streaming.kafka.KafkaCluster.findLeaders(KafkaCluster.scala:85)
at org.apache.spark.streaming.kafka.KafkaCluster.getLeaderOffsets(KafkaCluster.scala:179)
at org.apache.spark.streaming.kafka.KafkaCluster.getLeaderOffsets(KafkaCluster.scala:161)
at org.apache.spark.streaming.kafka.KafkaCluster.getEarliestLeaderOffsets(KafkaCluster.scala:155)
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$5.apply(KafkaUtils.scala:213)
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$5.apply(KafkaUtils.scala:211)
at scala.util.Either$RightProjection.flatMap(Either.scala:523)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at myPackage.Ingestion$.createStreamingContext(Ingestion.scala:120)
at myPackage.Ingestion$$anonfun$1.apply(Ingestion.scala:55)
at myPackage.Ingestion$$anonfun$1.apply(Ingestion.scala:55)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:864)
at myPackage.Ingestion$.main(Ingestion.scala:55)
at myPackage.Ingestion.main(Ingestion.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
上网我终于认为这是一个版本问题,但我无法弄清楚为什么会发生这种情况,因为在纱线客户端和纱线集群模式下,罐子都是相同的。
你有什么想法吗?
谢谢你, 马可
答案 0 :(得分:1)
看起来Spark streaming 1.6与Kafka 0.8兼容(见documentation)
我猜您正在使用Kafka客户端0.9,它会从您的jar中以客户端模式获取,但当您切换到群集模式时,默认Kafka客户端使用(0.8.2.1)。
我是对的吗?如果是这样,您是否可以尝试从构建中删除kafka客户端依赖项并使用CyclicBehaviour
提供的默认依赖项? (0.8客户应与0.9经纪人合作)。
答案 1 :(得分:0)
对于那些可能遇到同样问题的人,我们的问题是由于Splice Machine安装造成的。实际上,Splice Machine需要在YARN额外的罐子配置中设置它的罐子(其中还有一个由它们组成的火花组件)。
现在,我们正试图找到一种方法,让所有事情都能在没有unisntall Splice Machine的情况下运行。
感谢。