我需要使用Scala和Spark来使用来自远程Kafka队列主题的消息。默认情况下,远程计算机上的Kafka端口设置为7072
,而不是9092
。此外,在远程计算机上安装了以下版本:
这意味着我应该将代理列表(使用端口7072
)从Scala传递到远程Kafka,否则它将尝试使用默认端口。
问题是根据日志,远程机器无法识别参数bootstrap.servers
。我还尝试将此参数重命名为metadata.broker.list
,broker.list
和listeners
,但始终在日志Property bootstrap.servers is not valid
中出现相同的错误,然后是9092
端口默认情况下使用(并且显然不会消息消息)。
在POM文件中,我对Kafka和Spark使用以下依赖项:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.2</version>
</dependency>
所以,我使用的是Scala 2.10,而不是2.11。
这是我的Scala代码(如果我在Amazon Cloud中使用我自己的Kafka,我有EMR机器(我有用于Kafka的端口9092
),它可以正常工作):
val testTopicMap = testTopic.split(",").map((_, kafkaNumThreads.toInt)).toMap
val kafkaParams = Map[String, String](
"broker.list" -> "XXX.XX.XXX.XX:7072",
"zookeeper.connect" -> "XXX.XX.XXX.XX:2181",
"group.id" -> "test",
"zookeeper.connection.timeout.ms" -> "10000",
"auto.offset.reset" -> "smallest")
val testEvents: DStream[String] =
KafkaUtils
.createStream[String, String, StringDecoder, StringDecoder](
ssc,
kafkaParams,
testTopicMap,
StorageLevel.MEMORY_AND_DISK_SER_2
).map(_._2)
我正在阅读this Documentation,但看起来我所做的一切都是正确的。我应该使用其他一些Kafka客户端API(其他Maven依赖项)吗?
更新#1:
我也尝试过Direct Stream(没有Zookeeper),但它让我遇到错误:
val testTopicMap = testTopic.split(",").toSet
val kafkaParams = Map[String, String]("metadata.broker.list" -> "XXX.XX.XXX.XX:7072,XXX.XX.XXX.XX:7072,XXX.XX.XXX.XX:7072","bootstrap.servers" -> "XXX.XX.XXX.XX:7072,XXX.XX.XXX.XX:7072,XXX.XX.XXX.XX:7072",
"auto.offset.reset" -> "smallest")
val testEvents = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, testTopicMap).map(_._2)
testEvents.print()
17/01/02 12:23:15 ERROR ApplicationMaster: User class threw exception: org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
更新#2:
我发现了这个相关话题。建议的解决方案是Fixed it by setting the property 'advertised.host.name' as instructed by the comments in the kafka configuration (config/server.properties)
。我是否正确理解应在安装了Kafka的远程计算机上更改config/server.properties
?
Kafka : How to connect kafka-console-consumer to fetch remote broker topic content?
答案 0 :(得分:0)
我想我最近遇到了同样的问题(EOFException),原因是卡夫卡版本不匹配。
如果我看这里https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka_2.10/1.6.2 kafka流版本的编译时依赖性是0.8,而你使用0.10。
据我所知,0.9已经与0.8不兼容。你可以尝试设置本地0.8或0.9代理并尝试连接吗?