我们使用elasticsearch来收集SQL统计信息。 一旦我们注意到某些条目没有出现在聚合中。
这是一个示例请求(最初由kibana生成):
POST /_msearch
{"index":["stat-2017-09-04"],"ignore_unavailable":true,"preference":1504514752086}
{
"query":{
"bool":{
"must":[
{
"query_string":{
"analyze_wildcard":true,
"query":"Group:spbpro.db.sql AND AppUserName:robot"
}
},
{
"range":{
"EndTime":{
"gte":1504503690000,
"lte":1504503692800,
"format":"epoch_millis"
}
}
}
],
"must_not":[
]
}
},
"aggs":{
"3":{
"terms":{
"field":"Name.keyword",
"size":5000,
"order":{
"1":"desc"
}
},
"aggs":{
"1":{
"sum":{
"field":"TotalTime"
}
},
"2":{
"date_histogram":{
"field":"EndTime",
"interval":"20ms",
"time_zone":"Asia/Baghdad",
"min_doc_count":1
},
"aggs":{
"1":{
"sum":{
"field":"TotalTime"
}
}
}
}
}
}
}
}
这是弹性搜索答案:
{
"responses": [
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 4.754195,
"hits": [
{
"_index": "stat-2017-09-04",
"_type": "stat-spbpro.db.sql",
"_id": "AV5LaI15AUHnqGLtN2GS",
"_score": 4.754195,
"_source": {
"Group": "spbpro.db.sql",
"Name": "select * from (select a.IDPU, sum(d.COUNT)as CNT from ( select IDPU, max(ID) as ID from (select IDPU, ID from PARAMS where IDTPPARAM in (select ID from TPPARAMS where IDTPARC=?)) where ID in (select IDPARAM from DATA_1064_A where DTPU>=? and DTPU<=?) group by IDPU ) a join DATA_1064_A d on d.IDPARAM=a.ID and DTPU>=? and DTPU<=? group by IDPU) where IDPU in (select ID from TEMP_IDS where IDTYPE=1)",
"StartTime": "2017-09-04T05:36:09.0559048Z",
"EndTime": "2017-09-04T05:41:31.7295827Z",
"TotalTime": 297761.8962,
"Count": 13
}
},
{
"_index": "stat-2017-09-04",
"_type": "stat-spbpro.db.sql",
"_id": "AV5LaI15AUHnqGLtN2OF",
"_score": 4.7034826,
"_source": {
"Group": "spbpro.db.sql",
"Name": "select IDPU, count(*) as HRSCNT from PUTEDATAS where DTFR>=? and DTFR<? and IDPU in (select ID from TEMP_IDS where IDTYPE=1) group by IDPU",
"StartTime": "2017-09-04T05:37:06.2981554Z",
"EndTime": "2017-09-04T05:41:32.7463729Z",
"TotalTime": 4277.6874,
"Count": 13
}
}
]
},
"aggregations": {
"3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"1": {
"value": 4277
},
"2": {
"buckets": [
{
"1": {
"value": 4277
},
"key_as_string": "2017-09-04T08:41:32.740+03:00",
"key": 1504503692740,
"doc_count": 1
}
]
},
"key": "select IDPU, count(*) as HRSCNT from PUTEDATAS where DTFR>=? and DTFR<? and IDPU in (select ID from TEMP_IDS where IDTYPE=1) group by IDPU",
"doc_count": 1
}
]
}
},
"status": 200
}
]
}
聚合包含“select IDPU,count(*)为HRSCNT ...”的存储桶。这是对的。
但是为什么“select * from(select a.IDPU ...”仅列在匹配中并且不会出现在聚合中?
Elasticsearch版本为5.0
答案 0 :(得分:1)
我认为您的映射可能看起来像这样:
...
"Name": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
...
当您没有明确设置映射时,这是字符串的默认映射。这意味着,超过256个字符的字符串不会在keyword
字段中编入索引(并且不会显示在聚合中)。见ignore_above docs。源仍然存储,因此您可以在搜索结果中看到它们,并可以搜索分析的字段(Name
)。
您可以通过明确创建映射并省略ignore_above
来解决问题。您必须将数据重新编入索引(您无法更改现有映射) - 您可以使用reindex api轻松完成此操作。如果您只关心将此字段作为关键字进行搜索(并且您不想要分析的字段),那么您也可以只使用一个keyword
字段,如下所示:
...
"Name": {
"type" "keyword"
}
}
...