无法使用Spark stream 1.6.2在Kafka 0.10.0中收到任何消息

时间:2017-01-31 16:38:24

标签: apache-spark apache-kafka spark-streaming

最近我们转向HDP 2.5,其中包括Kafka 0.10.0和Spark 1.6.2。所以我修改了我的pom和一些API以使用新的Kafka。我可以运行代码,但是我没有看到任何消息。我在下面添加了一个代码片段。我也贴了我的pom。我不确定这里出了什么问题。有人可以请帮助。

    SparkConf conf = new SparkConf().setMaster("local[2]").setAppName(
        "SparkApp");
JavaStreamingContext jssc = new JavaStreamingContext(conf,
        Durations.seconds(2));      

Map<String, Integer> topicMap = new HashMap<String, Integer>();
topicMap.put(this.topic, this.numThreads);

Map<String, String> kafkaParams = new HashMap<>();
kafkaParams.put("metadata.broker.list", kfkBroker);
kafkaParams.put("zookeeper.connect", zkBroker);
kafkaParams.put("group.id", "default");
kafkaParams.put("fetch.message.max.bytes", "60000000");                

JavaPairReceiverInputDStream<String, String> kafkaInStream = KafkaUtils.createStream(
    jssc,
    String.class,
    String.class,
    kafka.serializer.StringDecoder.class,
    kafka.serializer.StringDecoder.class,
    kafkaParams,
    topicMap,
    StorageLevel.MEMORY_AND_DISK());

    kafkaInStream.foreachRDD(new VoidFunction<JavaPairRDD<String, String>>()
{

    /**
     * 
     */
    private static final long serialVersionUID = 1L;

    public void call(JavaPairRDD<String, String> v1) throws Exception
    {

        System.out.println("inside call.. JavaPairRDD size  " + v1.count());
        for (Tuple2<String, String> test : v1.collect())
        {
            this.eventMessage.setMessage(test._2);
        }

    }

});

我得到一个输出“内部调用.. JavaPairRDD大小0”总是表示spark没有读取任何数据。我尝试通过控制台消费者将一些数据推送到主题中。但这没有帮助。

这是我的pom.xml(仅添加了依赖项)

<dependencies>
    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.17</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>0.10.1.1</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 -->
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka_2.10</artifactId>
        <version>0.10.1.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.6.2</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.10</artifactId>
        <version>1.6.2</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka_2.10</artifactId>
        <version>1.6.2</version>
        <scope>provided</scope>
    </dependency>

    <dependency>
        <groupId>org.json</groupId>
        <artifactId>json</artifactId>
        <version>20160810</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.7.3</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>com.101tec</groupId>
        <artifactId>zkclient</artifactId>
        <version>0.8</version>
    </dependency>
</dependencies>

1 个答案:

答案 0 :(得分:1)

--packages仅适用于Kafka 0.8+客户端。您仍然可以使用Kafka 0.8+客户端连接到0.10+群集,但会失去一些性能。

我建议您只使用bin/spark-submit --packages org.apache.spark:spark-streaming-kafka_2.10:1.6.2 ... 提交您的应用程序,以避免在依赖项中设置Kafka。如,

{{1}}