Elasticsearch Java高级客户端组by和max

时间:2019-07-15 00:13:03

标签: scala elasticsearch group-by

我正在使用Scala 2.12和Elasticsearch 6.5。使用高级Java客户端查询ES。

所需数据如一个简单的文档示例包含两组具有不同id和时间戳的数据(已发布2次)。

id:id_123和id_234(这是必需文档的2个不同的id)和时间戳记(仅表示形式),上午10点(对于id_123)和上午11点(对于id_234)。 因此,我只需要这些文件中最新的文件,即上午11点。

我有一些过滤条件,然后需要对field1进行分组并采用field2的最大值(即时间戳)。

val searchRequest = new SearchRequest("index_name")
val searchSourceBuilder = new SearchSourceBuilder()

val qb = QueryBuilders.boolQuery()
      .must(QueryBuilders.matchQuery("myfield.date", "2019-07-02"))
      .must(QueryBuilders.matchQuery("myfield.data", "1111"))
      .must(QueryBuilders.boolQuery()
        .should(QueryBuilders.regexpQuery("myOtherFieldId", "myregex1"))
        .should(QueryBuilders.regexpQuery("myOtherFieldId", "myregex2"))
      )

val myAgg = AggregationBuilders.terms("group_by_Id").field("field1.Id").subAggregation(AggregationBuilders.max("timestamp").field("field1.timeStamp"))

searchSourceBuilder.query(qb)
searchSourceBuilder.aggregation(myAgg)
searchSourceBuilder.size(1000)

searchRequest.source(searchSourceBuilder)

val searchResponse = client.search(searchRequest, RequestOptions.DEFAULT)

基本上,如果我不使用汇总,一切都会很好。

使用聚合时,出现以下错误:

ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Expected numeric type on field [field1.timeStamp], but got [keyword]]]

那么我在这里想念什么? 我基本上是在寻找类似SQL的查询,该查询具有文件管理器(其中,AND / OR子句),然后按字段(Id)分组,仅在timeStamp为max的情况下获取文档。

更新:

我通过命令提示符在cURL中尝试了上述查询,并在聚集时使用“ max”时得到了相同的错误。

{
  "query": {
        "bool": {
            "must": [
                {
                    "match": { "myfield.date" : "2019-07-02" }
                },
                {
                    "match": { "myfield.data" : "1111" }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "regexp": { "myOtherFieldId": "myregex1" }
                            },
                            {
                                "regexp": { "myOtherFieldId": "myregex2" }
                            }
                        ]
                    }
                }
            ]
        }
    },
    "aggs": {
        "NAME" : {
            "terms": { 
                "field": "field1.Id"
            },
            "aggs": {
                "NAME": {
                    "max" : {
                        "field": "field1.timeStamp"
                    }
                }
            }
        }
    },
    "size": "10000"
}

我遇到同样的错误。

我试图检查索引的映射。 它显示为关键字。那么如何在此类字段上执行最大操作?

添加相关映射:

{"index_name":{"mappings":{"data":{"dynamic_templates":[{"boolean_as_keyword":{"match":"*","match_mapping_type":"boolean","mapping":{"ignore_above":256,"type":"keyword"}}},{"double_as_keyword":{"match":"*","match_mapping_type":"double","mapping":{"ignore_above":256,"type":"keyword"}}},{"long_as_keyword":{"match":"*","match_mapping_type":"long","mapping":{"ignore_above":256,"type":"keyword"}}},{"string_as_keyword":{"match":"*","match_mapping_type":"string","mapping":{"ignore_above":256,"type":"keyword"}}}],"date_detection":false,"properties":{"header":{"properties":{"Id":{"type":"keyword","ignore_above":256},"otherId":{"type":"keyword","ignore_above":256},"someKey":{"type":"keyword","ignore_above":256},"dataType":{"type":"keyword","ignore_above":256},"processing":{"type":"keyword","ignore_above":256},"otherKey":{"type":"keyword","ignore_above":256},"sender":{"type":"keyword","ignore_above":256},"receiver":{"type":"keyword","ignore_above":256},"system":{"type":"keyword","ignore_above":256},"timeStamp":{"type":"keyword","ignore_above":256}}}}}}}}

UPDATE2:

我认为我需要汇总(timeStamp)关键字。

请注意,timeStamp是子字段,即field1下的子字段。因此,下面的关键字语法似乎不起作用,或者我缺少其他内容。

"aggs": {
            "NAME" : {
                "terms": { 
                    "field": "field1.Id"
                },
                "aggs": {
                    "NAME": {
                        "max" : {
                            "field": "field1.timeStamp.keyword"
                        }
                    }
                }
            }
        }

现在失败:

"Invalid aggregator order path [field1.timeStamp]. Unknown aggregation [field1]"

0 个答案:

没有答案