Question

我试图通过spark-shell内部的Spark结构化流媒体来阅读Kafka主题，但似乎我没有从Kafka获得任何一行。

仅Kafka工作正常（使用控制台用户和控制台制作人测试）：

var array = [40, 100, 1, 5, 25, 10];

function greatestToLeast(array) {
  array.sort(function(a, b){return b-a});
}

console.log("Not yet sorted:", array);
greatestToLeast(array)
console.log("Sorted:", array);

这是我在spark-shell中运行的代码：

~/opt/bd/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testtopic --from-beginning
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
first
thrid
fifth
seventh
eight
bla
blal2
das ist 
testmaschine
hallo
kleiner
blsllslsd

我希望在Kafka中获取已为此主题存储的消息，并且所有消息都将在Spark shell中打印。但没有任何印刷品。我的错误在哪里？我使用的是Spark 2.0.2和Kafka 010.2。

Answer 1

您需要更改Kafka引导程序服务器的端口。像这样 -

ds1 = spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "localhost:9092")
  .option("subscribe", "testtopic")
  .option("startingOffsets" , "earliest")
  .load()

ds1.writeStream.format("console").start

然后您就可以从readStream获取值。

我希望它有所帮助！

readStream kafka没有获得任何值

1 个答案: