使用Java Spark - kafka直接流从Kafka主题获得的价值

时间:2016-11-23 09:31:53

标签: java apache-spark apache-kafka spark-streaming kafka-consumer-api

我没有使用Kafka直接流从队列中获取任何数据。在我的代码中我把System.out.println()这个语句没有运行,这意味着我没有从该主题获得任何数据..

我非常确定数据在队列中可用,因为没有进入控制台。

我也没有在控制台中看到任何错误。

任何人都可以提出建议吗?

这是我的Java代码,

SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaWordCount11").setMaster("local[*]");
        sparkConf.set("spark.streaming.concurrentJobs", "3");

        // Create the context with 2 seconds batch size
        JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(3000));

        Map<String, Object> kafkaParams = new HashMap<>();
        kafkaParams.put("bootstrap.servers", "x.xx.xxx.xxx:9092");
        kafkaParams.put("key.deserializer", StringDeserializer.class);
        kafkaParams.put("value.deserializer", StringDeserializer.class);
        kafkaParams.put("group.id", "use_a_separate_group_id_for_each_stream");
        kafkaParams.put("auto.offset.reset", "latest");
        kafkaParams.put("enable.auto.commit", true);

        Collection<String> topics = Arrays.asList("topicName");

        final JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils.createDirectStream(jssc,
                LocationStrategies.PreferConsistent(),
                ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));


        JavaPairDStream<String, String> lines = stream
                .mapToPair(new PairFunction<ConsumerRecord<String, String>, String, String>() {
                    @Override
                    public Tuple2<String, String> call(ConsumerRecord<String, String> record) {

                        return new Tuple2<>(record.key(), record.value());
                    }
                });

        lines.print();

        // System.out.println(lines.count());
        lines.foreachRDD(rdd -> {
            rdd.values().foreachPartition(p -> {
                while (p.hasNext()) {
                    System.out.println("Value of Kafka queue" + p.next());
                }
            });
        });

2 个答案:

答案 0 :(得分:1)

我可以使用直接kafka流打印从kafka队列中获取的字符串..

这是我的代码,

import java.util.HashMap;
import java.util.HashSet;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.Arrays;
import java.util.Calendar;
import java.util.Collection;
import java.util.Currency;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.atomic.AtomicReference;
import java.util.regex.Pattern;

import scala.Tuple2;

import kafka.serializer.StringDecoder;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.kafka.HasOffsetRanges;
import org.apache.spark.streaming.kafka.KafkaUtils;
import org.apache.spark.streaming.kafka.OffsetRange;
import org.json.JSONObject;
import org.omg.CORBA.Current;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.Durations;

public final class KafkaConsumerDirectStream {

    public static void main(String[] args) throws Exception {

        try {
            SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaWordCount11").setMaster("local[*]");
            sparkConf.set("spark.streaming.concurrentJobs", "30");

            // Create the context with 2 seconds batch size
            JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(200));

            Map<String, String> kafkaParams = new HashMap<>();
            kafkaParams.put("metadata.broker.list", "x.xx.xxx.xxx:9091");

            Set<String> topics = new HashSet();
            topics.add("PartWithTopic02Queue");

            JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(jssc, String.class,
                    String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topics);

            JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
                @Override
                public String call(Tuple2<String, String> tuple2) {
                    return tuple2._2();
                }
            });

            lines.foreachRDD(rdd -> {

                if (rdd.count() > 0) {
                    List<String> strArray = rdd.collect();

                    // Print string here
                }
            });

            jssc.start();
            jssc.awaitTermination();
        }
    }
    catch (Exception e) {
            e.printStackTrace();
        }
}

答案 1 :(得分:0)

@Vimal这是在Scala中创建直接流的工作版本的link

我相信在Scala中查看后,您必须轻松转换它。

请确保您已关闭以阅读Kafka中的最新主题。它可能不会选择上次处理的任何主题。