在弹性搜索中使用带有关键字数据类型的规范化器会产生意外结

时间:2017-10-31 11:45:02

标签: elasticsearch

我创建了一个索引

PUT twitter
{
  "settings": {
    "index": {
      "analysis": {
        "normalizer": {
          "caseinsensitive_exact_match_normalizer": {
            "filter": "lowercase",
            "type": "custom"
          }
        },
        "analyzer": {
          "whitespace_lowercasefilter_analyzer": {
            "filter": "lowercase",
            "char_filter": "html_strip",
            "type": "custom",
            "tokenizer": "standard"
          }
        }
      }
    }
  },

  "mappings": {
    "test" : {
      "properties": {
        "col1" : {
          "type": "keyword"
        },
        "col2" : {
          "type": "keyword",
            "normalizer": "caseinsensitive_exact_match_normalizer"
        }
      } 
    }

  }
}

然后我在索引中插入值为

POST twitter/test
{
  "col1" : "Dhruv",
  "col2" : "Dhruv"
}

然后我将索引查询为

GET twitter/_search
{
  "query": {
    "term": {
      "col2": {
        "value": "DHRUV"
      }
    }
  }
}

我得到了结果

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "twitter",
        "_type": "test",
        "_id": "AV9yNWQb3aJEm8NgRhd_",
        "_score": 0.2876821,
        "_source": {
          "col1": "Dhruv",
          "col2": "Dhruv"
        }
      }
    ]
  }
}

根据我的理解,我们不应该得到结果,因为术语查询会忽略分析,所以它应该在倒排索引中搜索DHRUV,并且存储的索引值应该是dhruv,因为我们使用了{{ 1}}。我怀疑术语查询不会忽略caseinsensitive_exact_match_normalizer。是吗?

我正在使用ES 5.4.1

1 个答案:

答案 0 :(得分:3)

It seems it's normalterm查询在搜索时考虑规范化程序。但是,正如之前所述的问题相关,已经确定这不是预期的行为。

如果您想查看ES正在重写您的查询类型,您可以使用以下内容:

GET /_validate/query?index=twitter&explain
{
  "query": {
    "term": {
      "col2": {
        "value": "DHRUV"
      }
    }
  }
}

将向您展示为什么会得到这些结果:

  "explanations": [
    {
      "index": "twitter",
      "valid": true,
      "explanation": "col2:dhruv"
    }
  ]