我有一个用例,我需要从kafka和每条消息中读取消息,提取数据并调用elasticsearch索引。响应将进一步用于进一步处理。 我在调用JavaEsSpark.esJsonRDD
时遇到错误java.lang.ClassCastException:org.elasticsearch.spark.rdd.EsPartition与org.apache.spark.rdd.ParallelCollectionPartition不兼容 在org.apache.spark.rdd.ParallelCollectionRDD.compute(ParallelCollectionRDD.scala:102)
我的代码段位于
之下 public static void main(String[] args) {
if (args.length < 4) {
System.err.println("Usage: JavaKafkaIntegration <zkQuorum> <group> <topics> <numThreads>");
System.exit(1);
}
SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaIntegration").setMaster("local[2]").set("spark.driver.allowMultipleContexts", "true");
//Setting when using JavaEsSpark.esJsonRDD
sparkConf.set("es.nodes",<NODE URL>);
sparkConf.set("es.nodes.wan.only","true");
context = new JavaSparkContext(sparkConf);
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));
int numThreads = Integer.parseInt(args[3]);
Map<String, Integer> topicMap = new HashMap<>();
String[] topics = args[2].split(",");
for (String topic: topics) {
topicMap.put(topic, numThreads);
}
//Receive Message From kafka
JavaPairReceiverInputDStream<String, String> messages =
KafkaUtils.createStream(jssc,args[0], args[1], topicMap);
JavaDStream<String> jsons = messages
.map(new Function<Tuple2<String, String>, String>() {
/**
*
*/
private static final long serialVersionUID = 1L;
@Override
public String call(Tuple2<String, String> tuple2){
JavaRDD<String> esRDD = JavaEsSpark.esJsonRDD(context, <index>,<search string> ).values() ;
return null;
}
});
jsons.print();
jssc.start();
jssc.awaitTermination();
}
调用JavaEsSpark.esJsonRDD时出错。这是正确的方法吗?如何从spark成功调用ES? 我在Windows上运行kafka和spark并调用外部弹性搜索索引。