我正在尝试将Spark Streaming 2.2.0与Kafka 0.8一起使用。
我已按照此文档:https://spark.apache.org/docs/latest/streaming-kafka-0-8-integration.html
但我有一个问题:
[WARN ] 2018-01-25 14:54:01,332 org.apache.spark.scheduler.TaskSetManager - Lost task 3.0 in stage 0.0 (TID 3, ip-10-0-155-42.eu-west-1.compute.internal, executor 8): java.lang.NoSuchMethodError: net.jpountz.util.Utils.checkRange([BII)V
at org.apache.kafka.common.message.KafkaLZ4BlockInputStream.read(KafkaLZ4BlockInputStream.java:176)
关于dependencyGraph,似乎
org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0
org.apache.spark:spark-streaming_2.11:2.2.0
org.apache.spark:spark-core_2.11:2.2.0
net.jpountz.lz4:lz4:1.3.0
卡夫卡需要lz4:1.2.0。
[更新]如果我将lz4的版本强制为1.2.0。我还有另一个问题
Caused by: java.lang.NoClassDefFoundError: net/jpountz/util/SafeUtils
at org.apache.spark.io.LZ4BlockInputStream.read(LZ4BlockInputStream.java:124)
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2606)
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2622)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3099)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:63)
at org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:63)
at org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:122)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:291)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:226)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:81)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
我该如何解决?