如何在Elasticsearch中处理新的(上次运行后的索引)数据?

时间:2017-12-11 19:17:27

标签: elasticsearch

有没有办法获得弹性搜索文档的日期和时间?

我正在通过spark运行es查询,并希望 NOT 查看我已经处理过的所有文档。相反,我想阅读上次程序运行和现在之间摄取的唯一文档。

最有效的方法是什么?

我看过了;

  • 更新以添加具有布尔值的数组的字段,如果已通过哪个分析查看。否定的是等待更新发生。
  • 每个时间段的索引方法,可以按小时将当前索引分解为较小的索引。我看到的是打开文件描述符的数量。
  • ...

Elasticsearch 5.6版

3 个答案:

答案 0 :(得分:4)

我在elasticsearch discussion board上发布了问题,使用ingest pipeline显示是最佳选择。

答案 1 :(得分:2)

I am running es queries via spark and would prefer NOT to look through all documents that I have already processed. Instead I would like read the only documents that were ingested between the last time the program ran and now.

A workaround could be :

While inserting data using Logstash to Elasticsearch, Logstash appends a @timestamp key to the document which represents the time (in UTC) at which the document is created or we can use an ingest pipline

After that we can query based on the timestamp.

For more on this please have a look at :

  1. Mapping changes
  2. There is no way to ask ES to insert a timestamp at index time

答案 2 :(得分:1)

Elasticsearch没有这样的功能。

您需要手动保存每个文档日期。在这种情况下,您将能够按日期范围进行搜索。