Question

我正在尝试使用ES提供的Spark-ES API来阅读ElasticSearch。我认为spark可以使查询更快。但这确实很慢，甚至比使用ES rest API还要慢。

我测试了其余API和spark API的速度，发现spark API慢得多。

这是我的火花配置：

SparkConf sparkConf = new SparkConf()
                .setAppName("readEs")
                .setMaster("local[*]")
                .set("es.index.auto.create", "true")
                .set("es.nodes", "55.13.9.136")
                .set("es.port", "9200")
                .set("es.nodes.wan.only", "true");

SparkSession sparkSession = SparkSession.builder().config(sparkConf).getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(sparkSession.sparkContext());

这是查询部分：

JavaRDD<Map<String, Object>> searchRdd = esRDD(jsc, "20190904", o.toJSONString()).values();

for(Map<String,Object> item:searchRdd.collect()){
                    item.forEach((key,value)->{
                        System.out.println("search key:"+key+", search value: "+ value);
                    });

这是我使用es rest API的方式：

client = RestClient.builder(new HttpHost("55.13.9.136",9200,"http")).build();
entity = new NStringEntity(o.toJSONString(), ContentType.APPLICATION_JSON);
Response response = client.performRequest("GET","/20190904/_search",params,entity);
responseBody = EntityUtils.toString(response.getEntity());
JSONObject jsonObject = JSON.parseObject(responseBody);
System.out.println("result: "+jsonObject.get("hits"));

我正在按时间戳搜索数据，时间间隔是10000ms，经过大约30分钟，它循环了180次。 spark API大约需要36000毫秒，而es rest api大约需要2500毫秒。而且我发现，如果删除“ searchRDD.collect（）”或“ searchRDD.count（）”这一行，Spark API大约需要500毫秒。我不知道查询是否运行。如果它确实运行了，如果我不能使用searchRDD方法，该如何获取数据？谢谢。

为什么在Spark上阅读ES这么慢？

0 个答案: