根据Elastic Search中的术语查找文档中的属性进行过滤和排序

时间:2019-09-19 14:46:55

标签: elasticsearch

我的索引中有一些文件:

POST "/index/thing/_bulk" -s -d'
    { "index":{ "_id": 1 } }
    { "title":"One thing"}
    { "index":{ "_id": 2 } }
    { "title":"Second thing"}
    { "index":{ "_id": 3 } }
    { "title":"Three things"}
    { "index":{ "_id": 4 } }
    { "title":"And so fourth"}
    { "index":{ "_id": 5 } }
    { "title":"Five things"}
'

我还有一些包含用户collection的文档,这些用户通过文档id属性链接到其他文档(事物),如下所示:

PUT /index/collection/1
{
    "items": [
        {"id": 1, "time_added": "2017-08-07T09:07:15.000Z", "condition": "fair"},
        {"id": 3, "time_added": "2019-08-07T09:07:15.000Z", "condition": "good"},
        {"id": 4, "time_added": "2016-08-07T09:07:15.000Z", "condition": "poor"}
    ]
}

然后我使用terms lookup来获取用户集合中的所有内容,如下所示:

GET /documents/_search
{
    "query" : {
        "terms" : {
            "_id" : {
                "index" : "index",
                "type" : "collection",
                "id" : 1,
                "path" : "items.id"
            }
        }
    }
}

这很好。我从集合中获得了三个文档,可以根据需要搜索,排序和使用聚合。

但是有没有一种方法可以根据time_added文档中的属性(在这种情况下为conditioncollection来对这些文档进行汇总,过滤和排序?说我想根据time_added进行排序,还是要对集合中的condition=="good"进行过滤?

也许可以将脚本应用于collection来排序或过滤其中的项目?感觉好像这已经非常接近sql,例如left-join,所以也许E​​lastic Search是错误的工具?

1 个答案:

答案 0 :(得分:0)

您似乎需要nested data type

以您的数据为例:

无嵌套类型

GET collection/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "items.condition": {
              "value": "good"
            }
          }
        },
        {
          "range": {
            "items.time_added": {
              "lte": "2019-09-01"
            }
          }
        }
      ]
    }
  }
}

查询(您会得到不正确的结果-预期是一个,得到五个):

"2016-08-01T00:00:00.000Z"

聚合(错误的结果-查看第一个存储桶CONDITION-包含3个GET collection/_search { "size": 0, "aggs": { "DATE": { "date_histogram": { "field": "items.time_added", "calendar_interval": "month" }, "aggs": { "CONDITION": { "terms": { "field": "items.condition.keyword", "size": 10 } } } } } } 子存储桶,每种条件类型)

DELETE collection

PUT collection
{
  "mappings": {
    "properties": {
      "items": {
        "type": "nested"
      }
    }
  }
}

# and POST the same data from above

具有嵌套类型

GET collection/_search
{
  "query": {
    "nested": {
      "path": "items",
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "items.condition": {
                  "value": "good"
                }
              }
            },
            {
              "range": {
                "items.time_added": {
                  "lte": "2019-09-01"
                }
              }
            }
          ]
        }
      }
    }
  }
}

查询(仅返回一个结果)

CONDITION

聚合(第一个日期存储桶仅包含一个GET collection/_search { "size": 0, "aggs": { "ITEMS": { "nested": { "path": "items" }, "aggs": { "DATE": { "date_histogram": { "field": "items.time_added", "calendar_interval": "month" }, "aggs": { "CONDITION": { "terms": { "field": "items.condition.keyword", "size": 10 } } } } } } } } 子存储桶)

"WriteTo": [
  {
    "Name": "Async",
    "Args": {
      "configure": [
        {
          "Name": "Console",
          "Args": {
            "theme": "Serilog.Sinks.SystemConsole.Themes.AnsiConsoleTheme::Code, Serilog.Sinks.Console",
            "outputTemplate": "[{Timestamp:HH:mm:ss} {Level:u3}] {Message:lj} <s:{SourceContext}>{NewLine}{Exception}"
          }
        }
      ]
    }
  },
  {
    "Name": "Async",
    "Args": {
      "configure": [
        {
          "Name": "File",
          "Args": {
            "path": "logs/log.txt",
            "formatter": "Serilog.Formatting.Json.JsonFormatter",
            "rollingInterval": "Day",
            "retainedFileCountLimit": 7,
            "buffered": true
          }
        }
      ]
    }
  }
]

希望有帮助:)