Spark中的批量更新-Elasticsearch

时间:2019-02-04 15:47:24

标签: java apache-spark elasticsearch rdd elasticsearch-bulk-api

我从Kafka主题获取事件,并且在将它们存储在Elastic Search中之前对其进行了预先汇总。我需要的是一种在upsert脚本中更新文档计数器并将其存储在Elastic Search中的方法

目前,为了使工作正常,我正在这样做:

if (!results.collect().isEmpty()) { // save to es
 AggEventEsDocument documentToUpdate = results.collect().get(0);
 String updateScript = "ctx._source.count += " + documentToUpdate.getCount();

 esSettings.put(ConfigurationOptions.ES_UPDATE_SCRIPT_INLINE, updateScript);
 JavaEsSpark.saveToEs(results, jobConfig.getEsIndexAndType(), esSettings);
    }
((CanCommitOffsets) messages.inputDStream()).commitAsync(offsetRanges);

在这里,我只得到第一个文档的数量,我要做的是:

(我知道它不起作用,我已经收到了“任务不可序列化选项”)

results.foreach(event -> {
 String updateScript = "ctx._source.count += " + event.getCount();     
 esSettings.put(ConfigurationOptions.ES_UPDATE_SCRIPT_INLINE, updateScript);
 JavaEsSpark.saveToEs(event, jobConfig.getEsIndexAndType(), esSettings);
})

或者也许是这样的:

BulkRequest bulkRequest = new BulkRequest();
results.foreach(event -> {
 UpdateRequest request = new UpdateRequest(jobConfig.getEsIndex(), jobConfig.getEsType(), aggregatedEvent.getPairId());
            Map<String, Object> parameters = Collections.singletonMap("count", aggregatedEvent.getCount());

 Script inline = new Script(ScriptType.INLINE, "painless", "ctx._source.count += params.count", parameters);
            request.script(inline);

 String jsonString = aggregatedEvent.toString();
            request.upsert(jsonString, XContentType.JSON);

 bulkRequest.add(request);
});

client.bulk(bulkRequest, RequestOptions.DEFAULT);

0 个答案:

没有答案