匹配查询没有自定义分析器的结果

时间:2016-04-07 16:53:03

标签: elasticsearch

我有两个索引:

首先:

curl -XPUT 'http://localhost:9200/first/' -d '
{
  "mappings": {
    "product": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer":"spanish"
        }
      }
    }
  }
}
'

第二

curl -XPUT 'http://localhost:9200/second/' -d '
{
  "mappings": {
      "product": {
        "properties": {
          "name": {
            "type": "string",
             "analyzer":"spanish_custom"
          }
        }
      }
    },
  "settings": {
    "analysis": {
      "filter": {
        "spanish_stop": {
          "type":       "stop",
          "stopwordsPath":  "spanish_stop_custom.txt" 
        },
        "spanish_stemmer": {
          "type":       "stemmer",
          "language":   "spanish"
        }
      },
      "analyzer": {
        "spanish_custom": {
          "tokenizer":  "standard",
          "filter": [
            "standard",
            "lowercase",
            "spanish_stop",
            "spanish_stemmer"
          ]
        }
      }
    }
  }
}
'

我为两个索引插入了一些文档:

curl -XPOST 'http://localhost:9200/first/product' -d '
{
  "name": "Hidratante"
}'

curl -XPOST 'http://localhost:9200/second/product' -d '
{
  "name": "Hidratante"
}'

我检查了令牌的字段名称:

curl -XGET 'http://localhost:9200/first/_analyze?field=name' -d 'hidratante'

{"tokens":[{"token":"hidratant","start_offset":0,"end_offset":10,"type":"<ALPHANUM>","position":1}]}



curl -XGET 'http://localhost:9200/second/_analyze?field=name' -d 'hidratante'

{"tokens":[{"token":"hidrat","start_offset":0,"end_offset":10,"type":"<ALPHANUM>","position":1}]}

我想搜索“hidratant”&#39;并在两个索引中给出结果,但我得到的结果只有第一个索引

我的查询:

curl -XGET 'http://127.0.0.1:9200/first/_search' -d '
{
  "query" : {
    "multi_match" : {
      "query" : "hidratant",
      "fields" : [ "name"],
      "type" : "phrase_prefix",
      "operator" : "AND",
      "prefix_length" : 3,
      "tie_breaker": 1
    }
  }
}
'

第一个索引结果:

{"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.5945348,"hits":[{"_index":"test","_type":"product","_id":"AVPxjvpRDl8qAEgsMFMu","_score":0.5945348,"_source":
{
  "name": "Hidratante"
}},{"_index":"test","_type":"product","_id":"AVPxkYbKDl8qAEgsMFMv","_score":0.5945348,"_source":
{
  "name": "Hidratante"
}}]}}

第二个指数结果:

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

为什么第二个索引没有返回结果?

1 个答案:

答案 0 :(得分:0)

正如您在上面的问题中提到的,对于第二个索引,为术语Hidratante生成的令牌是:

{"tokens":[{"token":"hidrat","start_offset":0,"end_offset":10,"type":"<ALPHANUM>","position":1}]}

执行搜索操作时会出现search analyzer的概念。根据文件:

  

默认情况下,查询将使用搜索时字段映射中定义的分析器。

因此,当您运行phrase_prefix查询时,您创建的同一个自定义分析器将在第二个索引中的name字段上执行操作。

由于您要搜索关键字:hidratant

它被分析为:

第一个索引:

curl -XGET 'http://localhost:9200/first/_analyze?field=name' -d 'hidratant'

{
"tokens": [
  {
     "token": "hidratant",
     "start_offset": 3,
     "end_offset": 12,
     "type": "<ALPHANUM>",
     "position": 1
    }
  ]
 }

即为什么你得到第一个索引的结果。

对于第二个索引:

curl -XGET 'http://localhost:9200/second/_analyze?field=name' -d 'hidratant'

 {
 "tokens": [
  {
     "token": "hidratant",
     "start_offset": 3,
     "end_offset": 12,
     "type": "<ALPHANUM>",
     "position": 1
   }
  ]
 }

搜索时生成的令牌为hidratant,但在编制索引时为hidrat。这就是为什么你在第二种情况下得不到任何结果的原因。