使用Kafka进行Spark流式处理时出错

时间:2018-01-29 11:40:45

标签: apache-spark apache-kafka spark-streaming

当我通过spark-submit启动流媒体任务时,我收到有关Kafka属性无效的警告消息:

VerifiableProperties: Property auto.offset.reset is overridden to largest
VerifiableProperties: Property enable.auto.commit is not valid.
VerifiableProperties: Property sasl.kerberos.service.name is not valid
VerifiableProperties: Property key.deserializer is not valid
...
VerifiableProperties: Property zookeeper.connect is overridden to ....

正确选取了属性zookeeper.connect

以下属性怎么可能

kafkaParams.put("enable.auto.commit", "false");

......被视为无效?

在我的java代码中,我在HashMap中提供了kafka参数,并将其传递给KafkaUtils.createDirectStream() API。我不知道为什么这些属性被警告为无效。

当我尝试打印JavaPairInputDStream时     directStream.print();

抛出异常:

java.io.EOFException: Received -1 when reading from channel, socket had likely been closed.

将jaas.conf作为

提供给spark-submit命令
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf"

私钥,公钥包含在spark-submit命令中

--files jaas.conf,privatekey,publickey

1 个答案:

答案 0 :(得分:0)

我在同一个类中使用基于接收器的流进行测试

JavaPairReceiverInputDStream<String, byte[]> receiverStream = KafkaUtils.createStream(...);

...并直接播放

JavaPairInputDStream<String, byte[]> directStream = KafkaUtils.createDirectStream(...);

从此处找到一些必要的更改:

https://community.hortonworks.com/questions/6332/how-to-read-from-a-kafka-topic-using-spark-streami.html

看起来,只有基于接收器的流可以使用给定的软件堆栈版本。

但是,来自VerifiableProperties的警告消息仍然存在。 尚未解决