Elasticsearch术语聚合会跳过某些条目

时间:2017-09-05 07:59:54

标签: elasticsearch aggregation

我们使用elasticsearch来收集SQL统计信息。 一旦我们注意到某些条目没有出现在聚合中。

这是一个示例请求(最初由kibana生成):

POST /_msearch 
{"index":["stat-2017-09-04"],"ignore_unavailable":true,"preference":1504514752086}
{
   "query":{
      "bool":{
         "must":[
            {
               "query_string":{
                  "analyze_wildcard":true,
                  "query":"Group:spbpro.db.sql AND AppUserName:robot"
               }
            },
            {
               "range":{
                  "EndTime":{
                     "gte":1504503690000,
                     "lte":1504503692800,
                     "format":"epoch_millis"
                  }
               }
            }
         ],
         "must_not":[

         ]
      }
   },
   "aggs":{
      "3":{
         "terms":{
            "field":"Name.keyword",
            "size":5000,
            "order":{
               "1":"desc"
            }
         },
         "aggs":{
            "1":{
               "sum":{
                  "field":"TotalTime"
               }
            },
            "2":{
               "date_histogram":{
                  "field":"EndTime",
                  "interval":"20ms",
                  "time_zone":"Asia/Baghdad",
                  "min_doc_count":1
               },
               "aggs":{
                  "1":{
                     "sum":{
                        "field":"TotalTime"
                     }
                  }
               }
            }
         }
      }
   }
}

这是弹性搜索答案:

{
    "responses": [
    {
        "took": 1,
        "timed_out": false,
        "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 2,
        "max_score": 4.754195,
        "hits": [
          {
            "_index": "stat-2017-09-04",
            "_type": "stat-spbpro.db.sql",
            "_id": "AV5LaI15AUHnqGLtN2GS",
            "_score": 4.754195,
            "_source": {
              "Group": "spbpro.db.sql",
              "Name": "select * from (select a.IDPU, sum(d.COUNT)as CNT     from (         select IDPU, max(ID) as ID             from (select IDPU, ID from PARAMS where             IDTPPARAM in (select ID from TPPARAMS where IDTPARC=?))             where             ID in (select IDPARAM from DATA_1064_A where DTPU>=? and DTPU<=?)             group by IDPU         ) a     join DATA_1064_A d on d.IDPARAM=a.ID and DTPU>=? and DTPU<=?     group by IDPU) where IDPU in (select ID from TEMP_IDS where IDTYPE=1)",
              "StartTime": "2017-09-04T05:36:09.0559048Z",
              "EndTime": "2017-09-04T05:41:31.7295827Z",
              "TotalTime": 297761.8962,
              "Count": 13
            }
          },
          {
            "_index": "stat-2017-09-04",
            "_type": "stat-spbpro.db.sql",
            "_id": "AV5LaI15AUHnqGLtN2OF",
            "_score": 4.7034826,
            "_source": {
              "Group": "spbpro.db.sql",
              "Name": "select IDPU, count(*) as HRSCNT from PUTEDATAS where DTFR>=? and DTFR<? and IDPU in (select ID from TEMP_IDS where IDTYPE=1) group by IDPU",
              "StartTime": "2017-09-04T05:37:06.2981554Z",
              "EndTime": "2017-09-04T05:41:32.7463729Z",
              "TotalTime": 4277.6874,
              "Count": 13
            }
          }
        ]
      },
      "aggregations": {
        "3": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "1": {
                "value": 4277
              },
              "2": {
                "buckets": [
                  {
                    "1": {
                      "value": 4277
                    },
                    "key_as_string": "2017-09-04T08:41:32.740+03:00",
                    "key": 1504503692740,
                    "doc_count": 1
                  }
                ]
              },
              "key": "select IDPU, count(*) as HRSCNT from PUTEDATAS where DTFR>=? and DTFR<? and IDPU in (select ID from TEMP_IDS where IDTYPE=1) group by IDPU",
              "doc_count": 1
            }
          ]
        }
      },
      "status": 200
    }
  ]
}

聚合包含“select IDPU,count(*)为HRSCNT ...”的存储桶。这是对的。

但是为什么“select * from(select a.IDPU ...”仅列在匹配中并且不会出现在聚合中?

Elasticsearch版本为5.0

1 个答案:

答案 0 :(得分:1)

我认为您的映射可能看起来像这样:

...
"Name": {
  "type" "text",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
}
...

当您没有明确设置映射时,这是字符串的默认映射。这意味着,超过256个字符的字符串不会在keyword字段中编入索引(并且不会显示在聚合中)。见ignore_above docs。源仍然存储,因此您可以在搜索结果中看到它们,并可以搜索分析的字段(Name)。

您可以通过明确创建映射并省略ignore_above来解决问题。您必须将数据重新编入索引(您无法更改现有映射) - 您可以使用reindex api轻松完成此操作。如果您只关心将此字段作为关键字进行搜索(并且您不想要分析的字段),那么您也可以只使用一个keyword字段,如下所示:

...
"Name": {
  "type" "keyword"
  }
}
...