无法将代理列表参数从Scala传递到Kafka:属性bootstrap.servers无效

时间:2017-01-02 10:14:34

标签: scala apache-spark apache-kafka spark-streaming

我需要使用Scala和Spark来使用来自远程Kafka队列主题的消息。默认情况下,远程计算机上的Kafka端口设置为7072,而不是9092。此外,在远程计算机上安装了以下版本:

  1. Kafka 0.10.1.0
  2. Scala 2.11
  3. 这意味着我应该将代理列表(使用端口7072)从Scala传递到远程Kafka,否则它将尝试使用默认端口。

    问题是根据日志,远程机器无法识别参数bootstrap.servers。我还尝试将此参数重命名为metadata.broker.listbroker.listlisteners,但始终在日志Property bootstrap.servers is not valid中出现相同的错误,然后是9092端口默认情况下使用(并且显然不会消息消息)。

    在POM文件中,我对Kafka和Spark使用以下依赖项:

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.10</artifactId>
        <version>1.6.2</version>
    </dependency>
    
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka_2.10</artifactId>
        <version>1.6.2</version>
    </dependency>
    

    所以,我使用的是Scala 2.10,而不是2.11。

    这是我的Scala代码(如果我在Amazon Cloud中使用我自己的Kafka,我有EMR机器(我有用于Kafka的端口9092),它可以正常工作):

        val testTopicMap = testTopic.split(",").map((_, kafkaNumThreads.toInt)).toMap
    
       val kafkaParams = Map[String, String](
          "broker.list" -> "XXX.XX.XXX.XX:7072",
          "zookeeper.connect" -> "XXX.XX.XXX.XX:2181",
          "group.id" -> "test",
          "zookeeper.connection.timeout.ms" -> "10000",
          "auto.offset.reset" -> "smallest")
    
        val testEvents: DStream[String] =
          KafkaUtils
            .createStream[String, String, StringDecoder, StringDecoder](
            ssc,
            kafkaParams,
            testTopicMap,
            StorageLevel.MEMORY_AND_DISK_SER_2
          ).map(_._2)
    

    我正在阅读this Documentation,但看起来我所做的一切都是正确的。我应该使用其他一些Kafka客户端API(其他Maven依赖项)吗?

    更新#1:

    我也尝试过Direct Stream(没有Zookeeper),但它让我遇到错误:

    val testTopicMap = testTopic.split(",").toSet
    val kafkaParams = Map[String, String]("metadata.broker.list" -> "XXX.XX.XXX.XX:7072,XXX.XX.XXX.XX:7072,XXX.XX.XXX.XX:7072","bootstrap.servers" -> "XXX.XX.XXX.XX:7072,XXX.XX.XXX.XX:7072,XXX.XX.XXX.XX:7072",
                                          "auto.offset.reset" -> "smallest")
    val testEvents = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, testTopicMap).map(_._2)
    
    testEvents.print()
    
    17/01/02 12:23:15 ERROR ApplicationMaster: User class threw exception: org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    

    更新#2:

    我发现了这个相关话题。建议的解决方案是Fixed it by setting the property 'advertised.host.name' as instructed by the comments in the kafka configuration (config/server.properties)。我是否正确理解应在安装了Kafka的远程计算机上更改config/server.properties

    Kafka : How to connect kafka-console-consumer to fetch remote broker topic content?

1 个答案:

答案 0 :(得分:0)

我想我最近遇到了同样的问题(EOFException),原因是卡夫卡版本不匹配。

如果我看这里https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka_2.10/1.6.2 kafka流版本的编译时依赖性是0.8,而你使用0.10。

据我所知,0.9已经与0.8不兼容。你可以尝试设置本地0.8或0.9代理并尝试连接吗?