Kafka主题详细信息未显示在spark

时间:2017-09-04 06:15:41

标签: java apache-spark apache-kafka spark-streaming

我在Kafka写了一个主题为my-topic,我试图在spark中获取主题信息。但是我在显示Kafka主题细节方面遇到了一些困难,因为我得到了很长的错误列表。我正在使用java来获取数据。

以下是我的代码:

public static void main(String s[]) throws InterruptedException{
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("Sampleapp");
    JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(10));

    Map<String, Object> kafkaParams = new HashMap<>();
    kafkaParams.put("bootstrap.servers", "localhost:9092");
    kafkaParams.put("key.deserializer", StringDeserializer.class);
    kafkaParams.put("value.deserializer", StringDeserializer.class);
    kafkaParams.put("group.id", "Different id is allotted for different stream");
    kafkaParams.put("auto.offset.reset", "latest");
    kafkaParams.put("enable.auto.commit", false);

    Collection<String> topics = Arrays.asList("my-topic");

    final JavaInputDStream<ConsumerRecord<String, String>> stream =
      KafkaUtils.createDirectStream(
        jssc,
        LocationStrategies.PreferConsistent(),
        ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
      );

    JavaPairDStream<String, String> jPairDStream =  stream.mapToPair(
            new PairFunction<ConsumerRecord<String, String>, String, String>() {
                /**
                 * 
                 */
                private static final long serialVersionUID = 1L;

                @Override
                public Tuple2<String, String> call(ConsumerRecord<String, String> record) throws Exception {
                    return new Tuple2<>(record.key(), record.value());
                }
            });

    jPairDStream.foreachRDD(jPairRDD -> {
           jPairRDD.foreach(rdd -> {
                System.out.println("key="+rdd._1()+" value="+rdd._2());
            });
        });

    jssc.start();            
    jssc.awaitTermination(); 

    stream.mapToPair(
            new PairFunction<ConsumerRecord<String, String>, String, String>() {
                /**
                 * 
                 */
                private static final long serialVersionUID = 1L;

                @Override
                public Tuple2<String, String> call(ConsumerRecord<String, String> record) throws Exception {
                    return new Tuple2<>(record.key(), record.value());
                }
            });
}

以下是我得到的错误:

  

使用Spark的默认log4j配置文件:   org / apache / spark / log4j-defaults.properties 17/09/04 11:41:15信息   SparkContext:运行Spark版本2.1.0 17/09/04 11:41:15警告   NativeCodeLoader:无法加载native-hadoop库   平台...在适用的情况下使用builtin-java类17/09/04   11:41:15 INFO SecurityManager:将视图更改为:11014525   17/09/04 11:41:15 INFO SecurityManager:将修改修改为:   11014525 17/09/04 11:41:15 INFO SecurityManager:更改视图acls   组:17/09/04 11:41:15 INFO SecurityManager:更改修改   acls团队:17/09/04 11:41:15 INFO SecurityManager:   SecurityManager:禁用身份验证; ui acls disabled;用户   具有查看权限:设置(11014525);具有查看权限的组:   组();具有修改权限的用户:Set(11014525);团体   修改权限:设置()17/09/04 11:41:15 INFO Utils:成功   开始服务&#39; sparkDriver&#39;在港口56668. 17/09/04 11:41:15信息   SparkEnv:注册MapOutputTracker 17/09/04 11:41:15信息   SparkEnv:注册BlockManagerMaster 17/09/04 11:41:15信息   BlockManagerMasterEndpoint:使用   用于获取拓扑的org.apache.spark.storage.DefaultTopologyMapper   information 17/09/04 11:41:15 INFO BlockManagerMasterEndpoint:   BlockManagerMasterEndpoint up 17/09/04 11:41:15 INFO DiskBlockManager:   在。创建本地目录   C:\用户\ 11014525 \应用程序数据\本地的\ Temp \ blockmgr-cba489b9-2458-455a-8c03-4c4395a01d44   17/09/04 11:41:15 INFO MemoryStore:MemoryStore以容量开始   896.4 MB 17/09/04 11:41:16 INFO SparkEnv:注册OutputCommitCoordinator 17/09/04 11:41:16 INFO Utils:Successfully   开始服务&#39; SparkUI&#39;在4040港口.17/09/04 11:41:16信息   SparkUI:将SparkUI绑定到0.0.0.0,并从中开始   http://172.16.202.21:4040 17/09/04 11:41:16 INFO执行官:开始   主机localhost上的执行程序ID驱动程序17/09/04 11:41:16 INFO Utils:   成功开始服务   &#39; org.apache.spark.network.netty.NettyBlockTransferService&#39;在港口   56689. 17/09/04 11:41:16 INFO NettyBlockTransferService:服务器创建于172.16.202.21:56689 17/09/04 11:41:16 INFO BlockManager:   使用org.apache.spark.storage.RandomBlockReplicationPolicy进行阻止   复制策略17/09/04 11:41:16 INFO BlockManagerMaster:   注册BlockManager BlockManagerId(驱动程序,172.16.202.21,56689,   无)17/09/04 11:41:16 INFO BlockManagerMasterEndpoint:注册   块管理器172.16.202.21:56689,内存为896.4 MB,   BlockManagerId(驱动程序,172.16.202.21,56689,无)17/09/04 11:41:16   INFO BlockManagerMaster:已注册的BlockManager   BlockManagerId(驱动程序,172.16.202.21,56689,无)17/09/04 11:41:16   INFO BlockManager:初始化BlockManager:BlockManagerId(驱动程序,   172.16.202.21,56689,无)17/09/04 11:41:16 WARN KafkaUtils:覆盖enable.auto.commit为false执行人17/09/04 11:41:16   WARN KafkaUtils:将执行者的auto.offset.reset重写为none   17/09/04 11:41:16 WARN KafkaUtils:压倒执行人group.id到   spark-executor-不同的id分配给不同的流17/09/04   11:41:16 WARN KafkaUtils:覆盖receive.buffer.bytes到65536看   KAFKA-3135 17/09/04 11:41:16 INFO DirectKafkaInputDStream:滑动时间   = 10000 ms 17/09/04 11:41:16 INFO DirectKafkaInputDStream:存储级别=序列化1x复制17/09/04 11:41:16 INFO   DirectKafkaInputDStream:Checkpoint interval = null 17/09/04 11:41:16   INFO DirectKafkaInputDStream:记住interval = 10000 ms 17/09/04   11:41:16 INFO DirectKafkaInputDStream:初始化和验证   org.apache.spark.streaming.kafka010.DirectKafkaInputDStream@23a3407b   17/09/04 11:41:16 INFO MappedDStream:滑动时间= 10000毫秒17/09/04   11:41:16 INFO MappedDStream:存储级别=已序列化1x已复制   17/09/04 11:41:16 INFO MappedDStream:检查点间隔= null   17/09/04 11:41:16 INFO MappedDStream:记住interval = 10000 ms   17/09/04 11:41:16 INFO MappedDStream:初始化和验证   org.apache.spark.streaming.dstream.MappedDStream@140030a9 17/09/04   11:41:16 INFO ForEachDStream:滑动时间= 10000毫秒17/09/04 11:41:16   INFO ForEachDStream:存储级别=序列化1x复制17/09/04   11:41:16 INFO ForEachDStream:Checkpoint interval = null 17/09/04   11:41:16 INFO ForEachDStream:记住间隔= 10000毫秒17/09/04   11:41:16 INFO ForEachDStream:初始化和验证   org.apache.spark.streaming.dstream.ForEachDStream@65041548 17/09/04   11:41:16 ERROR StreamingContext:启动上下文时出错,标记   它停止了org.apache.kafka.common.config.ConfigException:丢失   必需的配置&#34; partition.assignment.strategy&#34;没有   默认值。在   org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:124)at at   org.apache.kafka.common.config.AbstractConfig。(AbstractConfig.java:48)     在   org.apache.kafka.clients.consumer.ConsumerConfig。(ConsumerConfig.java:194)     在   org.apache.kafka.clients.consumer.KafkaConsumer。(KafkaConsumer.java:380)     在   org.apache.kafka.clients.consumer.KafkaConsumer。(KafkaConsumer.java:363)     在   org.apache.kafka.clients.consumer.KafkaConsumer。(KafkaConsumer.java:350)     在   org.apache.spark.streaming.kafka010.Subscribe.onStart(ConsumerStrategy.scala:83)     在   org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.consumer(DirectKafkaInputDStream.scala:75)at at   org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.start(DirectKafkaInputDStream.scala:243)     在   org.apache.spark.streaming.DStreamGraph $$ anonfun $开始$ 5.apply(DStreamGraph.scala:49)     在   org.apache.spark.streaming.DStreamGraph $$ anonfun $开始$ 5.apply(DStreamGraph.scala:49)     在   scala.collection.parallel.mutable.ParArray $ ParArrayIterator.foreach_quick(ParArray.scala:143)     在   scala.collection.parallel.mutable.ParArray $ ParArrayIterator.foreach(ParArray.scala:136)     在   scala.collection.parallel.ParIterableLike $ Foreach.leaf(ParIterableLike.scala:972)     在   scala.collection.parallel.Task $$ anonfun $ tryLeaf $ 1.适用$ MCV $ SP(Tasks.scala:49)     在   scala.collection.parallel.Task $$ anonfun $ tryLeaf $ 1.适用(Tasks.scala:48)     在   scala.collection.parallel.Task $$ anonfun $ tryLeaf $ 1.适用(Tasks.scala:48)     在scala.collection.parallel.Task $ class.tryLeaf(Tasks.scala:51)at   scala.collection.parallel.ParIterableLike $ Foreach.tryLeaf(ParIterableLike.scala:969)     在   scala.collection.parallel.AdaptiveWorkStealingTasks $ WrappedTask $ class.compute(Tasks.scala:152)     在   scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks $ WrappedTask.compute(Tasks.scala:443)     在   scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)     在   scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)     在   scala.concurrent.forkjoin.ForkJoinPool $ WorkQueue.runTask(ForkJoinPool.java:1339)     在   scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)     在   scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)     at ...使用org.apache.spark.util.ThreadUtils在单独的线程中运行   ... () 在   org.apache.spark.streaming.StreamingContext.liftedTree1 $ 1(StreamingContext.scala:578)     在   org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:572)     在   org.apache.spark.streaming.api.java.JavaStreamingContext.start(JavaStreamingContext.scala:556)     在   Json.ExcelToJson.SparkConsumingKafka.main(SparkConsumingKafka.java:56)   17/09/04 11:41:16 INFO ReceiverTracker:ReceiverTracker停了下来   17/09/04 11:41:16 INFO JobGenerator:立即停止JobGenerator   17/09/04 11:41:16 INFO RecurringTimer:停止JobGenerator的计时器   之后-1 17/09/04 11:41:16 INFO JobGenerator:停了下来   JobGenerator 17/09/04 11:41:16 INFO JobScheduler:已停止JobScheduler   线程&#34; main&#34;中的例外情况   org.apache.kafka.common.config.ConfigException:缺少必需的   配置&#34; partition.assignment.strategy&#34;没有默认值   值。在   org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:124)at at   org.apache.kafka.common.config.AbstractConfig。(AbstractConfig.java:48)     在   org.apache.kafka.clients.consumer.ConsumerConfig。(ConsumerConfig.java:194)     在   org.apache.kafka.clients.consumer.KafkaConsumer。(KafkaConsumer.java:380)     在   org.apache.kafka.clients.consumer.KafkaConsumer。(KafkaConsumer.java:363)     在   org.apache.kafka.clients.consumer.KafkaConsumer。(KafkaConsumer.java:350)     在   org.apache.spark.streaming.kafka010.Subscribe.onStart(ConsumerStrategy.scala:83)     在   org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.consumer(DirectKafkaInputDStream.scala:75)at at   org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.start(DirectKafkaInputDStream.scala:243)     在   org.apache.spark.streaming.DStreamGraph $$ anonfun $开始$ 5.apply(DStreamGraph.scala:49)     在   org.apache.spark.streaming.DStreamGraph $$ anonfun $开始$ 5.apply(DStreamGraph.scala:49)     在   scala.collection.parallel.mutable.ParArray $ ParArrayIterator.foreach_quick(ParArray.scala:143)     在   scala.collection.parallel.mutable.ParArray $ ParArrayIterator.foreach(ParArray.scala:136)     在   scala.collection.parallel.ParIterableLike $ Foreach.leaf(ParIterableLike.scala:972)     在   scala.collection.parallel.Task $$ anonfun $ tryLeaf $ 1.适用$ MCV $ SP(Tasks.scala:49)     在   scala.collection.parallel.Task $$ anonfun $ tryLeaf $ 1.适用(Tasks.scala:48)     在   scala.collection.parallel.Task $$ anonfun $ tryLeaf $ 1.适用(Tasks.scala:48)     在scala.collection.parallel.Task $ class.tryLeaf(Tasks.scala:51)at   scala.collection.parallel.ParIterableLike $ Foreach.tryLeaf(ParIterableLike.scala:969)     在   scala.collection.parallel.AdaptiveWorkStealingTasks $ WrappedTask $ class.compute(Tasks.scala:152)     在   scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks $ WrappedTask.compute(Tasks.scala:443)     在   scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)     在   scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)     在   scala.concurrent.forkjoin.ForkJoinPool $ WorkQueue.runTask(ForkJoinPool.java:1339)     在   scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)     在   scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)     at ...使用org.apache.spark.util.ThreadUtils在单独的线程中运行   ... () 在   org.apache.spark.streaming.StreamingContext.liftedTree1 $ 1(StreamingContext.scala:578)     在   org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:572)     在   org.apache.spark.streaming.api.java.JavaStreamingContext.start(JavaStreamingContext.scala:556)     在   Json.ExcelToJson.SparkConsumingKafka.main(SparkConsumingKafka.java:56)   17/09/04 11:41:16 INFO SparkContext:从关闭调用stop()   hook 17/09/04 11:41:16 INFO SparkUI:停止了Spark Web UI   http://172.16.202.21:4040 17/09/04 11:41:16 INFO   MapOutputTrackerMasterEndpoint:MapOutputTrackerMasterEndpoint   停! 17/09/04 11:41:16 INFO MemoryStore:MemoryStore已清除   17/09/04 11:41:16 INFO BlockManager:BlockManager于17/09/04停止   11:41:16 INFO BlockManagerMaster:BlockManagerMaster于17/09/04停止   11:41:16 INFO OutputCommitCoordinator $ OutputCommitCoordinatorEndpoint:   OutputCommitCoordinator停了! 17/09/04 11:41:16 INFO SparkContext:   成功停止了SparkContext 17/09/04 11:41:16 INFO   ShutdownHookManager:关闭钩子叫做17/09/04 11:41:16 INFO   ShutdownHookManager:删除目录   C:\ Users \用户11014525 \应用程序数据\本地\ TEMP \火花37334cdc-9680-4801-8e50-ef3024ed1d8a

的pom.xml

  

      org.apache.spark       火花streaming_2.11       2.1.0                                    公共琅                     公共琅                     2.6                                     org.apache.kafka       kafka_2.10       0.8.2.0               org.apache.spark               火花流 - 卡夫卡0-10_2.10               2.1.1           

1 个答案:

答案 0 :(得分:1)

从日志中,您的spark版本是2.1.0。您尚未共享具有其他依赖项的构建文件。看起来你在类路径中同时拥有spark-streaming-kafka-0-8_2.11-2.1.0.jarspark-streaming-kafka-0-10_2.11-2.1.0.jar,并且它正在加载错误的类。如果您正在使用maven,那么您将需要如下所示的依赖项。请检查并更新您的项目。

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.0</version>
</dependency>
<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.1.0</version>
</dependency>
<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.1.0</version>
</dependency>  
<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
        <version>2.1.0</version>
</dependency> 

修改

当您编辑问题并发布依赖关系时,我正在编辑我的答案。您正在使用Kafka版本0.8.*,而您的spark-streaming-kafka版本为0.10.*。请为Kafka依赖项使用相同的版本。请使用以下依赖关系org.apache.kafka

<dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka_2.11</artifactId>
        <version>0.10.2.0</version>
</dependency>