我正在玩ES以了解它是否可以涵盖我的大部分场景。 我正处于思考如何在SQL中达到某些非常简单的结果的时候。
这是示例
在弹性方面,我有一个索引这个文件
{ "Id": 1, "Fruit": "Banana", "BoughtInStore"="Jungle", "BoughtDate"=20160101, "BestBeforeDate": 20160102, "BiteBy":"John"}
{ "Id": 2, "Fruit": "Banana", "BoughtInStore"="Jungle", "BoughtDate"=20160102, "BestBeforeDate": 20160104, "BiteBy":"Mat"}
{ "Id": 3, "Fruit": "Banana", "BoughtInStore"="Jungle", "BoughtDate"=20160103, "BestBeforeDate": 20160105, "BiteBy":"Mark"}
{ "Id": 4, "Fruit": "Banana", "BoughtInStore"="Jungle", "BoughtDate"=20160104, "BestBeforeDate": 20160201, "BiteBy":"Simon"}
{ "Id": 5, "Fruit": "Orange", "BoughtInStore"="Jungle", "BoughtDate"=20160112, "BestBeforeDate": 20160112, "BiteBy":"John"}
{ "Id": 6, "Fruit": "Orange", "BoughtInStore"="Jungle", "BoughtDate"=20160114, "BestBeforeDate": 20160116, "BiteBy":"Mark"}
{ "Id": 7, "Fruit": "Orange", "BoughtInStore"="Jungle", "BoughtDate"=20160120, "BestBeforeDate": 20160121, "BiteBy":"Simon"}
{ "Id": 8, "Fruit": "Kiwi", "BoughtInStore"="Shop", "BoughtDate"=20160121, "BestBeforeDate": 20160121, "BiteBy":"Mark"}
{ "Id": 8, "Fruit": "Kiwi", "BoughtInStore"="Jungle", "BoughtDate"=20160121, "BestBeforeDate": 20160121, "BiteBy":"Simon"}
如果我想知道在不同商店购买的水果有多少人在SQL的特定日期范围内咬一口我就写这样的东西
SELECT
COUNT(DISTINCT kpi.Fruit) as Fruits,
kpi.BoughtInStore,
kpi.BiteBy
FROM
(
SELECT f1.Fruit, f1.BoughtInStore, f1.BiteBy
FROM FruitsTable f1
WHERE f1.BoughtDate = (
SELECT MAX(f2.BoughtDate)
FROM FruitsTable f2
WHERE f1.Fruit = f2.Fruit
and f2.BoughtDate between 20160101 and 20160131
and (f2.BestBeforeDate between 20160101 and 20160131)
)
) kpi
GROUP BY kpi.BoughtInStore, kpi.ByteBy
结果是这样的
{ "Fruits": 1, "BoughtInStore": "Jungle", "BiteBy"="Mark"}
{ "Fruits": 1, "BoughtInStore": "Shop", "BiteBy"="Mark"}
{ "Fruits": 2, "BoughtInStore": "Jungle", "BiteBy"="Simon"}
您是否知道如何通过聚合在Elastic中获得相同的结果?
简而言之,我所面临的弹性问题是:
谢谢
答案 0 :(得分:2)
据我所知,没有办法在相同查询的过滤器中引用聚合结果。因此,您只需使用单个查询即可解决部分难题:
GET /purchases/fruits/_search
{
"query": {
"filtered":{
"filter": {
"range": {
"BoughtDate": {
"gte": "2015-01-01", //assuming you have right mapping for dates
"lte": "2016-03-01"
}
}
}
}
},
"sort": { "BoughtDate": { "order": "desc" }},
"aggs": {
"byBoughtDate": {
"terms": {
"field": "BoughtDate",
"order" : { "_term" : "desc" }
},
"aggs": {
"distinctCount": {
"cardinality": {
"field": "Fruit"
}
}
}
}
}
}
因此,您将拥有日期范围内的所有文档,并且您将按照术语排序聚合桶数,因此最大日期将位于顶部。客户端可以解析此第一个存储桶(计数和值),然后获取此日期值的文档。对于不同的水果计数,您只需使用嵌套基数聚合。
是的,查询返回的信息比您需要的多得多,但这就是生命:)
答案 1 :(得分:1)
当然,没有从SQL到Elasticsearch DSL的直接路由,但是有一些非常常见的相关性。
对于初学者来说,任何GROUP BY
/ HAVING
都会归结为聚合。查询DSL通常可以覆盖(通常更多)正常的查询语义。
如何在聚合之前准备一份数据(如本例中每个Fruit的范围内的最新行)
所以,你有点要求两件事。
如何在聚合之前准备好数据
这是查询阶段。
(就像在这个例子中每个Fruit的范围中的最新行)
您在技术上要求它聚合以获得此示例的答案:不是正常的查询。在您的示例中,您正在使用MAX
来获取有效的内容,使用GROUP BY来获取它。
如何按多个字段对结果进行分组
这取决于。你想要他们分层(通常,是)或者你想要他们在一起吗。
如果你想要它们分层,那么你只需使用子聚合来获得你想要的东西。如果您希望将它们组合在一起,那么通常只需对不同的分组使用filters
聚合。
将所有内容重新组合在一起:根据特定的过滤日期范围,您希望每次购买最新产品。日期范围只是普通的查询/过滤器:
{
"query": {
"bool": {
"filter": [
{
"range": {
"BoughtDate": {
"gte": "2016-01-01",
"lte": "2016-01-31"
}
}
},
{
"range": {
"BestBeforeDate": {
"gte": "2016-01-01",
"lte": "2016-01-31"
}
}
}
]
}
}
}
这样,请求中不会包含任何文档,这些文档不在两个字段的日期范围内(实际上是AND
)。因为我使用了一个过滤器,所以它是未编号和可缓存的。
现在,您需要开始聚合以获取其余信息。让我们首先假设使用上面的过滤器过滤了文档,以简化我们正在查看的内容。我们最后将它结合起来。
{
"size": 0,
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "BoughtDate",
"interval": "day",
"min_doc_count": 1
},
"aggs": {
"group_by_store": {
"terms": {
"field": "BoughtInStore"
},
"aggs": {
"group_by_person": {
"terms": {
"field": "BiteBy"
}
}
}
}
}
}
}
}
您希望"size" : 0
位于顶级,因为您实际上并不关心点击率。您只需要汇总结果。
您的第一个聚合实际上是按最近的日期进行分组。我改变了一点以使其更加真实(每个日),但它实际上是相同的。使用MAX
的方式,我们可以使用terms
聚合"size": 1
,但这是更真实的到你想要的时候涉及日期(可能是时间!)。我还要求它忽略没有数据的匹配文档中的日期(因为它从开始到结束,我们实际上并不关心那些日子。)
如果确实只想要最后一天,那么您可以使用管道聚合来删除除最大存储桶之外的所有内容,但是这种类型的请求的实际用法需要完整的日期范围。
所以,我们继续按商店分组,这就是你想要的。然后,我们按人分组(BiteBy
)。这将隐含地给你计数。
把它们全部重新组合在一起:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"BoughtDate": {
"gte": "2016-01-01",
"lte": "2016-01-31"
}
}
},
{
"range": {
"BestBeforeDate": {
"gte": "2016-01-01",
"lte": "2016-01-31"
}
}
}
]
}
},
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "BoughtDate",
"interval": "day",
"min_doc_count": 1
},
"aggs": {
"group_by_store": {
"terms": {
"field": "BoughtInStore"
},
"aggs": {
"group_by_person": {
"terms": {
"field": "BiteBy"
}
}
}
}
}
}
}
}
注意:这是我索引数据的方式。
PUT /grocery/store/_bulk
{"index":{"_id":"1"}}
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-01","BestBeforeDate":"2016-01-02","BiteBy":"John"}
{"index":{"_id":"2"}}
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-02","BestBeforeDate":"2016-01-04","BiteBy":"Mat"}
{"index":{"_id":"3"}}
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-03","BestBeforeDate":"2016-01-05","BiteBy":"Mark"}
{"index":{"_id":"4"}}
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-04","BestBeforeDate":"2016-02-01","BiteBy":"Simon"}
{"index":{"_id":"5"}}
{"Fruit":"Orange","BoughtInStore":"Jungle","BoughtDate":"2016-01-12","BestBeforeDate":"2016-01-12","BiteBy":"John"}
{"index":{"_id":"6"}}
{"Fruit":"Orange","BoughtInStore":"Jungle","BoughtDate":"2016-01-14","BestBeforeDate":"2016-01-16","BiteBy":"Mark"}
{"index":{"_id":"7"}}
{"Fruit":"Orange","BoughtInStore":"Jungle","BoughtDate":"2016-01-20","BestBeforeDate":"2016-01-21","BiteBy":"Simon"}
{"index":{"_id":"8"}}
{"Fruit":"Kiwi","BoughtInStore":"Shop","BoughtDate":"2016-01-21","BestBeforeDate":"2016-01-21","BiteBy":"Mark"}
{"index":{"_id":"9"}}
{"Fruit":"Kiwi","BoughtInStore":"Jungle","BoughtDate":"2016-01-21","BestBeforeDate":"2016-01-21","BiteBy":"Simon"}
严重您希望在(商店和个人)上汇总的字符串值为not_analyzed
string
s(ES 5.0中为keyword
)!否则它将使用所谓的fielddata,这不是一件好事。
在ES 1.x / ES 2.x中,映射看起来像这样:
PUT /grocery
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"store": {
"properties": {
"Fruit": {
"type": "string",
"index": "not_analyzed"
},
"BoughtInStore": {
"type": "string",
"index": "not_analyzed"
},
"BiteBy": {
"type": "string",
"index": "not_analyzed"
},
"BestBeforeDate": {
"type": "date"
},
"BoughtDate": {
"type": "date"
}
}
}
}
}
所有这一切,你得到答案:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_date": {
"buckets": [
{
"key_as_string": "2016-01-01T00:00:00.000Z",
"key": 1451606400000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-02T00:00:00.000Z",
"key": 1451692800000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Mat",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-03T00:00:00.000Z",
"key": 1451779200000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Mark",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-12T00:00:00.000Z",
"key": 1452556800000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-14T00:00:00.000Z",
"key": 1452729600000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Mark",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-20T00:00:00.000Z",
"key": 1453248000000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Simon",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-21T00:00:00.000Z",
"key": 1453334400000,
"doc_count": 2,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Simon",
"doc_count": 1
}
]
}
},
{
"key": "Shop",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Mark",
"doc_count": 1
}
]
}
}
]
}
}
]
}
}
}