我正在使用Scala 2.12和Elasticsearch 6.5。使用高级Java客户端查询ES。
所需数据如一个简单的文档示例包含两组具有不同id和时间戳的数据(已发布2次)。
id:id_123和id_234(这是必需文档的2个不同的id)和时间戳记(仅表示形式),上午10点(对于id_123)和上午11点(对于id_234)。 因此,我只需要这些文件中最新的文件,即上午11点。
我有一些过滤条件,然后需要对field1进行分组并采用field2的最大值(即时间戳)。
val searchRequest = new SearchRequest("index_name")
val searchSourceBuilder = new SearchSourceBuilder()
val qb = QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("myfield.date", "2019-07-02"))
.must(QueryBuilders.matchQuery("myfield.data", "1111"))
.must(QueryBuilders.boolQuery()
.should(QueryBuilders.regexpQuery("myOtherFieldId", "myregex1"))
.should(QueryBuilders.regexpQuery("myOtherFieldId", "myregex2"))
)
val myAgg = AggregationBuilders.terms("group_by_Id").field("field1.Id").subAggregation(AggregationBuilders.max("timestamp").field("field1.timeStamp"))
searchSourceBuilder.query(qb)
searchSourceBuilder.aggregation(myAgg)
searchSourceBuilder.size(1000)
searchRequest.source(searchSourceBuilder)
val searchResponse = client.search(searchRequest, RequestOptions.DEFAULT)
基本上,如果我不使用汇总,一切都会很好。
使用聚合时,出现以下错误:
ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Expected numeric type on field [field1.timeStamp], but got [keyword]]]
那么我在这里想念什么? 我基本上是在寻找类似SQL的查询,该查询具有文件管理器(其中,AND / OR子句),然后按字段(Id)分组,仅在timeStamp为max的情况下获取文档。
更新:
我通过命令提示符在cURL中尝试了上述查询,并在聚集时使用“ max”时得到了相同的错误。
{
"query": {
"bool": {
"must": [
{
"match": { "myfield.date" : "2019-07-02" }
},
{
"match": { "myfield.data" : "1111" }
},
{
"bool": {
"should": [
{
"regexp": { "myOtherFieldId": "myregex1" }
},
{
"regexp": { "myOtherFieldId": "myregex2" }
}
]
}
}
]
}
},
"aggs": {
"NAME" : {
"terms": {
"field": "field1.Id"
},
"aggs": {
"NAME": {
"max" : {
"field": "field1.timeStamp"
}
}
}
}
},
"size": "10000"
}
我遇到同样的错误。
我试图检查索引的映射。 它显示为关键字。那么如何在此类字段上执行最大操作?
添加相关映射:
{"index_name":{"mappings":{"data":{"dynamic_templates":[{"boolean_as_keyword":{"match":"*","match_mapping_type":"boolean","mapping":{"ignore_above":256,"type":"keyword"}}},{"double_as_keyword":{"match":"*","match_mapping_type":"double","mapping":{"ignore_above":256,"type":"keyword"}}},{"long_as_keyword":{"match":"*","match_mapping_type":"long","mapping":{"ignore_above":256,"type":"keyword"}}},{"string_as_keyword":{"match":"*","match_mapping_type":"string","mapping":{"ignore_above":256,"type":"keyword"}}}],"date_detection":false,"properties":{"header":{"properties":{"Id":{"type":"keyword","ignore_above":256},"otherId":{"type":"keyword","ignore_above":256},"someKey":{"type":"keyword","ignore_above":256},"dataType":{"type":"keyword","ignore_above":256},"processing":{"type":"keyword","ignore_above":256},"otherKey":{"type":"keyword","ignore_above":256},"sender":{"type":"keyword","ignore_above":256},"receiver":{"type":"keyword","ignore_above":256},"system":{"type":"keyword","ignore_above":256},"timeStamp":{"type":"keyword","ignore_above":256}}}}}}}}
UPDATE2:
我认为我需要汇总(timeStamp)关键字。
请注意,timeStamp是子字段,即field1下的子字段。因此,下面的关键字语法似乎不起作用,或者我缺少其他内容。
"aggs": {
"NAME" : {
"terms": {
"field": "field1.Id"
},
"aggs": {
"NAME": {
"max" : {
"field": "field1.timeStamp.keyword"
}
}
}
}
}
现在失败:
"Invalid aggregator order path [field1.timeStamp]. Unknown aggregation [field1]"