Question

有没有办法让Elasticsearch在分组时考虑序列间隙？

前提是将以下数据批量导入Elasticsearch：

{ "index": { "_index": "test", "_type": "groupingTest", "_id": "1" } }
{ "sequence": 1, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "2" } }
{ "sequence": 2, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "3" } }
{ "sequence": 3, "type": "B" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "4" } }
{ "sequence": 4, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "5" } }
{ "sequence": 5, "type": "A" }

有没有办法以

的方式查询这些数据

序列号为1和2的文档将转到一个输出组
序列号为3的文档转到另一个文档，
序列号为4和5的文件是否会转到第三组？

...考虑到A类序列被B类项目（或任何其他不属于A类的项目）中断的事实？

我希望结果桶看起来像这样（sequence_group的名称和值可能不同 - 只是试图说明逻辑）：

"buckets": [
    {
       "key": "a",
       "sequence_group": 1,
       "doc_count": 2
    },
    {
       "key": "b",
       "sequence_group": 3,
       "doc_count": 1
    },
    {
       "key": "a",
       "sequence_group": 4,
       "doc_count": 2
    }
]

在https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/处有一个很好的问题描述和一些SQL解决方案方法。我想知道是否有弹性搜索的解决方案。

Answer 1

您可以随时执行术语聚合，然后应用tops命中聚合来实现此目的。

{{1}}

使用Elasticsearch对连续文档进行分组

1 个答案: