从spark

时间:2016-04-10 07:23:44

标签: java apache-spark spark-streaming elastic-map-reduce

我有一个用例,我需要从kafka和每条消息中读取消息,提取数据并调用elasticsearch索引。响应将进一步用于进一步处理。 我在调用JavaEsSpark.esJsonRDD

时遇到错误
  

java.lang.ClassCastException:org.elasticsearch.spark.rdd.EsPartition与org.apache.spark.rdd.ParallelCollectionPartition不兼容       在org.apache.spark.rdd.ParallelCollectionRDD.compute(ParallelCollectionRDD.scala:102)

我的代码段位于

之下
              public static void main(String[] args) {
                if (args.length < 4) {
                    System.err.println("Usage: JavaKafkaIntegration <zkQuorum> <group> <topics> <numThreads>");
                    System.exit(1);
                  }

                SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaIntegration").setMaster("local[2]").set("spark.driver.allowMultipleContexts", "true");
                //Setting when using JavaEsSpark.esJsonRDD
            sparkConf.set("es.nodes",<NODE URL>);
                sparkConf.set("es.nodes.wan.only","true");
                context = new JavaSparkContext(sparkConf);


                // Create the context with 2 seconds batch size
                JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));

                int numThreads = Integer.parseInt(args[3]);
                Map<String, Integer> topicMap = new HashMap<>();
                String[] topics = args[2].split(",");
                for (String topic: topics) {
                  topicMap.put(topic, numThreads);
                }

                //Receive Message From kafka
                JavaPairReceiverInputDStream<String, String> messages =
                        KafkaUtils.createStream(jssc,args[0], args[1], topicMap);

                JavaDStream<String> jsons = messages
                        .map(new Function<Tuple2<String, String>, String>() {
                            /**
                             * 
                             */
                            private static final long serialVersionUID = 1L;

                            @Override
                            public String call(Tuple2<String, String> tuple2){

                                JavaRDD<String> esRDD =  JavaEsSpark.esJsonRDD(context, <index>,<search string>  ).values() ; 

                                 return null;

                            }


                        });             

                  jsons.print();
                  jssc.start();
                  jssc.awaitTermination();         


       }    

调用JavaEsSpark.esJsonRDD时出错。这是正确的方法吗?如何从spark成功调用ES? 我在Windows上运行kafka和spark并调用外部弹性搜索索引。

0 个答案:

没有答案