Elasticsearch 2.4中数组字段匹配过滤器的不同值

时间:2018-07-19 10:42:15

标签: elasticsearch

简而言之:我想在文档BUT的某些字段中查找仅与某些过滤器匹配的不同值。问题出在数组字段中。 想象一下在ES 2.4中有以下文档:

 <div class="name-block" 
  [ngClass]="{'name-block-width-' + valueItem.level:'name-block-width-' + valueItem.level,
                  active: activeSelected === valueItem.id,  'name-block': true }" 
 (click)="toggleExpand()">

我希望我的用户能够通过typeahead查找所有可能的状态,因此我对“ wa”用户请求具有以下查询:

[
  {
    "states": [
      "Washington (US-WA)",
      "California (US-CA)"
    ]
  },
  {
    "states": [
      "Washington (US-WA)"
    ]
  }
]

{ "query": { "wildcard": { "states.raw": "*wa*" } }, "aggregations": { "typed": { "terms": { "field": "states.raw" }, "aggregations": { "typed_hits": { "top_hits": { "_source": { "includes": ["states"] } } } } } } } 是带有states.raw选项的子字段

该查询非常有效,除非我有一个如示例中的值数组-它返回华盛顿和加利福尼亚。我确实了解,为什么会发生这种情况(即使只有一个选项与过滤器匹配,查询和聚合仍在文档顶部起作用,并且该文档包含了两者),但我真的只想看看华盛顿,而不会不想在应用程序方面为ES结果添加另一层过滤。

是否可以通过单个ES 2.4请求来做到这一点?

2 个答案:

答案 0 :(得分:1)

您可以使用“过滤值”功能(请参见https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-bucket-terms-aggregation.html#_filtering_values_2)。 因此,您的请求可能类似于:

POST /index/collection/_search?size=0
{
  "aggregations": {
    "typed": {
      "terms": {
        "field": "states.raw",
        "include": ".*wa.*" // You need to carefully quote the "wa" string because it'll be used as part of RegExp
      },
      "aggregations": {
        "typed_hits": {
          "top_hits": {
            "_source": { "includes": ["states"] }
          }
        }
      }
    }
  }
}

答案 1 :(得分:1)

不过,我无法忍受,也不会告诉您wildcard与通配符一起使用不是最佳解决方案。请做,请考虑使用ngrams

PUT states
{
  "settings": {
    "analysis": {
      "filter": {
        "ngrams": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "20"
        }
      },
      "analyzer": {
        "ngram_analyzer": {
          "type": "custom",
          "filter": [
            "standard",
            "lowercase",
            "ngrams"
          ],
          "tokenizer": "standard"
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "location": {
          "properties": {
            "states": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "ngrams": {
                  "type": "string",
                  "analyzer": "ngram_analyzer"
                }
              }
            }
          }
        }
      }
    }
  }
}


POST states/doc/1
{
  "text":"bla1",
  "location": [
    {
      "states": [
        "Washington (US-WA)",
        "California (US-CA)"
      ]
    },
    {
      "states": [
        "Washington (US-WA)"
      ]
    }
  ]
}
POST states/doc/2
{
  "text":"bla2",
  "location": [
    {
      "states": [
        "Washington (US-WA)",
        "California (US-CA)"
      ]
    }
  ]
}
POST states/doc/3
{
  "text":"bla3",
  "location": [
    {
      "states": [
        "California (US-CA)"
      ]
    },
    {
      "states": [
        "Illinois (US-IL)"
      ]
    }
  ]
}

最后一个查询:

GET states/_search
{
  "query": {
    "term": {
      "location.states.ngrams": {
        "value": "sh"
      }
    }
  },
  "aggregations": {
    "filtering_states": {
      "terms": {
        "field": "location.states.raw",
        "include": ".*sh.*"
      },
      "aggs": {
        "typed_hits": {
          "top_hits": {
            "_source": {
              "includes": [
                "location.states"
              ]
            }
          }
        }
      }
    }
  }
}