Question

Elasticsearch 2.1.1。该指数包含有关运动员跳跃的记录。每个运动员都有多次尝试跳跃。该文件具有以下结构：

{
   'event_at' : '2015-01-01T12:12:10', - date of jump
   'user_id' : 2142, - athlete’s id
   'distance' : 4 - result
}

有必要得到以下结果：

{'distance_range' : 
 {'*-5' : 12, - the number of unique athletes with the maximum jump score in the range from 0 to 5
  '6-10': 14,- the number of unique athletes with the maximum jump score in the range from 6 to 10
  '11-15': 5 - the number of unique athletes with the maximum jump score in the range from 11 to 15
 } 
}

我设法为每位运动员获得跳跃得分的最大值，但我不知道如何在更高的水平上获得这个结果。

使用SQL可以是这样的：

SELECT `distace_range`, count(*) FROM (
  SELECT 
    `user_id`,
    IF(MAX(`distace`) <=5, 
      '*-5', 
      IF(MAX(`distace`) >= 6 AND MAX(`distace`) >= 10,
        '6-10',
        '11-15'        
      ) 
    ) `distace_range`
  FROM `events`
  GROUP BY `user_id`
) t
GROUP BY `distace_range;

Answer 1

我发表了关于Elasticsearch专用的official forum的问题。目前，标准工具无法解决问题，因为对于以下查询：

'aggregations' => [
  'distance_range' => [
    'terms' => [
      'field' => 'doc.user_id',

    ],
    'aggregations' => [
      'max_distance' => [
        'max' => [
          'field' => 'doc.distance'
        ]
      ]
    ]
  ]
]

弹性搜索版本2.1中的

没有按范围或术语的管道聚合器。

有几种可能的方法可以解决这个问题：

创建包含最大结果的附加索引
使用脚本
在客户端汇总结果

我使用了第三种方法。

第一个选项有一个很大的缺点：要有一个相关的附加索引，就必须控制它。因此，我对这个解决方案不满意。

第二种选择也有一些重要的限制：计算的复杂性或对选择的影响会显着影响访问时间。此外，我们必须在多个系统中维护代码。

Elasticsearch范围唯一聚合doc

1 个答案: