Elasticsearch - 如何查询特定字段随时间变化的结果

时间:2015-11-19 02:35:13

标签: elasticsearch lucene

我将在声明中说我是弹性搜索的新手,所以这可能有一个简单的答案。到目前为止,我所阅读的内容都没有点击,这样我就可以实现以下目标。

一个非常简化的场景。我有一系列用户活动,如下所示:

timestamp: t0, user: mike, result: failed
timestamp: t1, user: anne, result: failed
timestamp: t2, user: bob,  result: success
timestamp: t3, user: tom,  result: success
timestamp: t4, user: jane, result: failed
timestamp: t5, user: anne, result: success
timestamp: t6, user: tom,  result: failed
timestamp: t7, user: jane, result: failed
timestamp: t8, user: mike, result: success

我需要确定所有必须努力工作以取得成功结果的用户(我忽略那些从未成功的用户)。为此,我真正需要做的就是找到用户在成功之前失败一次或多次的记录。

根据上面的序列,结果是'anne'用户或'mike'用户的记录。

我们忽略'jane',因为没有成功,我们忽略'bob',因为没有失败。我们也会忽略'汤姆',因为他们先成功然后失败 - 这又是一个不同的情况。

我可以在SQL中相对容易地做到这一点,但我很难在弹性搜索中实现这一点。

您如何形成一个回答这个问题的查询?

或者,甚至更好,我怎么能改写我的问题才能达到同样的结果?

谢谢!

1 个答案:

答案 0 :(得分:2)

很大的问题。花了一点力气才弄明白,但我设法使用ES 2.0中的新bucket selector aggregation来使用它。

我必须将时间戳更改为"integer"类型才能使其正常工作(但它也适用于日期)。

我创建了一个简单的索引,并使用_bulk请求添加了您的数据:

PUT /test_index

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"timestamp": 0,"user":"mike","result":"failed"}
{"index":{"_id":2}}
{"timestamp": 1,"user":"anne","result":"failed"}
{"index":{"_id":3}}
{"timestamp": 2,"user":"bob","result":"success"}
{"index":{"_id":4}}
{"timestamp": 3,"user":"tom","result":"success"}
{"index":{"_id":5}}
{"timestamp": 4,"user":"jane","result":"failed"}
{"index":{"_id":6}}
{"timestamp": 5,"user":"anne","result":"success"}
{"index":{"_id":7}}
{"timestamp": 6,"user":"tom","result":"failed"}
{"index":{"_id":8}}
{"timestamp": 7,"user":"jane","result":"failed"}
{"index":{"_id":9}}
{"timestamp": 8,"user":"mike","result":"success"}

然后我可以通过以下查询获得您所要求的内容(我认为)。在顶级"user_terms"聚合下,我可以设置三个子聚合:

  • "failed_filter"选择具有"result": "failed"的文档,然后子聚合查找该组中的最大时间戳;
  • "success_filter"选择包含"result": "success"的文档,然后子聚合查找 组中的最大时间戳;
  • 最后,"failed_lt_success_filter"仅选择附加到失败值的(最大)时间戳小于附加到成功值的(最大)时间戳的文档。

呼。

POST /test_index/_search
{
   "size": 0,
   "aggregations": {
      "user_terms": {
         "terms": {
            "field": "user"
         },
         "aggs": {
            "failed_filter": {
               "filter": { "term": { "result": "failed" } },
               "aggs": {
                  "max_timestamp": { "max": { "field": "timestamp" } }
               }
            },
            "success_filter": {
               "filter": { "term": { "result": "success" } },
               "aggs": {
                  "max_timestamp": { "max": { "field": "timestamp" } }
               }
            },
            "failed_lt_success_filter": {
               "bucket_selector": {
                  "buckets_path": {
                     "failed_timestamp": "failed_filter.max_timestamp",
                     "success_timestamp": "success_filter.max_timestamp"
                  },
                  "script": "failed_timestamp < success_timestamp"
               }
            }
         }
      }
   }
}

返回:

{
   "took": 11,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 9,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "user_terms": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "anne",
               "doc_count": 2,
               "success_filter": {
                  "doc_count": 1,
                  "max_timestamp": {
                     "value": 5
                  }
               },
               "failed_filter": {
                  "doc_count": 1,
                  "max_timestamp": {
                     "value": 1
                  }
               }
            },
            {
               "key": "mike",
               "doc_count": 2,
               "success_filter": {
                  "doc_count": 1,
                  "max_timestamp": {
                     "value": 8
                  }
               },
               "failed_filter": {
                  "doc_count": 1,
                  "max_timestamp": {
                     "value": 0
                  }
               }
            }
         ]
      }
   }
}

以下是我用来解决问题的一些代码:

http://sense.qbox.io/gist/06083e06191445a44610f32baf1dd45c7370401e