嵌套的ElasticSearch查询导致项目过多

时间:2014-06-06 09:56:24

标签: elasticsearch

下面嵌套的ElasticSearch查询会返回一些不应该命中的结果。许多结果不包含请求的订单号,但仍然列出。我没有获得所有文档,因此查询肯定会在某种程度上减少结果集。

{
  "query": {
    "nested": {
      "path": "orders",
      "query": {
        "match": {
          "orderNumber": "242347"
        }
      }      
    }
  }
}

查询结果(截断):

{
  "took":0,
  "timed_out":false,
  "_shards": {
    "total":1,
    "successful":1,
    "failed":0
  },
  "hits": {
    "total":60,
    "max_score":9.656103,
    "hits":[
      {
        "_index": "index1",
        "_type":"documenttype1",
        "_id":"mUmudQrVSC6rn68ujDJ8iA",
        "_score":9.656103,
        "_source" : {
          "documentId": 12093894,
          "orders": [
          {
            "customerId": 129048669,
            "orderNumber": "242347", // <-- CORRECT HIT ON ORDER
          },
          {
            "customerId": 229405848,
            "orderNumber": "431962"
          }
          ]
        }
      },
      {
        "_index":"index1",
        "_type":"documenttype1",
        "_id":"9iO5QBCpT_6kmH3CoBTdWw",
        "_score":9.656103, 
        "_source" : {
          "documentId": 43390283,
          // <-- ORDER ISN'T HERE BUT THE DOCUMENT IS HIT NEVERTHELESS!
          "orders": [
          {
            "customerId": 229405848,
            "orderNumber": "431962"
          },
          {
            "customerId": 129408979,
            "orderNumber": "142701"
          }
          ]
        }
      }
      // Left out 58 more results most of which do not contain
      // the requested order number.
    ]
  }
}

正如你所看到的,有一个命中(实际上,它们中有相当多)不应该存在,因为没有一个订单包含所请求的订单号。

这是documenttype1

的映射
{
   "index1":{
      "properties":{
         "documentId":{
            "type":"integer"
         },
         "orders":{
            "type":"nested",
            "properties":{
               "customerId":{
                  "type":"integer"
               },
               "orderNumber":{
                  "type":"string",
                  "analyzer":"custom_internal_code"
               }
            }
         }
      }
   }
}

最后,以下是上面显示的映射中提到的澄清custom_internal_code分析器的设置:

{
   "index1":{
      "settings":{
         "index.analysis.analyzer.custom_internal_code.filter.1":"asciifolding",
         "index.analysis.analyzer.custom_internal_code.type":"custom",
         "index.analysis.analyzer.custom_internal_code.filter.0":"lowercase",
         "index.analysis.analyzer.custom_internal_code.tokenizer":"keyword",
      }
   }
}

2 个答案:

答案 0 :(得分:2)

答案 1 :(得分:0)

您似乎应该使用bool查询而不是匹配。

但是。如果您只想过滤记录,则应使用嵌套过滤器而不是查询。它工作得更快,因为你没有计算分数。

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-filter.html

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "nested": {
          "path": "orders",
          "filter": {
            "bool": {
              "must": [
                {
                  "term": {
                    "orderNumber": "242347"
                  }
                }
              ]
            }
          },
          "_cache": true
        }
      }
    }
  }
}