Question

我有两个索引：

首先：

curl -XPUT 'http://localhost:9200/first/' -d '
{
  "mappings": {
    "product": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer":"spanish"
        }
      }
    }
  }
}
'

第二

curl -XPUT 'http://localhost:9200/second/' -d '
{
  "mappings": {
      "product": {
        "properties": {
          "name": {
            "type": "string",
             "analyzer":"spanish_custom"
          }
        }
      }
    },
  "settings": {
    "analysis": {
      "filter": {
        "spanish_stop": {
          "type":       "stop",
          "stopwordsPath":  "spanish_stop_custom.txt" 
        },
        "spanish_stemmer": {
          "type":       "stemmer",
          "language":   "spanish"
        }
      },
      "analyzer": {
        "spanish_custom": {
          "tokenizer":  "standard",
          "filter": [
            "standard",
            "lowercase",
            "spanish_stop",
            "spanish_stemmer"
          ]
        }
      }
    }
  }
}
'

我为两个索引插入了一些文档：

curl -XPOST 'http://localhost:9200/first/product' -d '
{
  "name": "Hidratante"
}'

curl -XPOST 'http://localhost:9200/second/product' -d '
{
  "name": "Hidratante"
}'

我检查了令牌的字段名称：

curl -XGET 'http://localhost:9200/first/_analyze?field=name' -d 'hidratante'

{"tokens":[{"token":"hidratant","start_offset":0,"end_offset":10,"type":"<ALPHANUM>","position":1}]}



curl -XGET 'http://localhost:9200/second/_analyze?field=name' -d 'hidratante'

{"tokens":[{"token":"hidrat","start_offset":0,"end_offset":10,"type":"<ALPHANUM>","position":1}]}

我想搜索“hidratant”＆＃39;并在两个索引中给出结果，但我得到的结果只有第一个索引

我的查询：

curl -XGET 'http://127.0.0.1:9200/first/_search' -d '
{
  "query" : {
    "multi_match" : {
      "query" : "hidratant",
      "fields" : [ "name"],
      "type" : "phrase_prefix",
      "operator" : "AND",
      "prefix_length" : 3,
      "tie_breaker": 1
    }
  }
}
'

第一个索引结果：

{"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.5945348,"hits":[{"_index":"test","_type":"product","_id":"AVPxjvpRDl8qAEgsMFMu","_score":0.5945348,"_source":
{
  "name": "Hidratante"
}},{"_index":"test","_type":"product","_id":"AVPxkYbKDl8qAEgsMFMv","_score":0.5945348,"_source":
{
  "name": "Hidratante"
}}]}}

第二个指数结果：

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

为什么第二个索引没有返回结果？

Answer 1

正如您在上面的问题中提到的，对于第二个索引，为术语Hidratante生成的令牌是：

{"tokens":[{"token":"hidrat","start_offset":0,"end_offset":10,"type":"<ALPHANUM>","position":1}]}

执行搜索操作时会出现search analyzer的概念。根据文件：

默认情况下，查询将使用搜索时字段映射中定义的分析器。

因此，当您运行phrase_prefix查询时，您创建的同一个自定义分析器将在第二个索引中的name字段上执行操作。

由于您要搜索关键字：hidratant

它被分析为：

第一个索引：

curl -XGET 'http://localhost:9200/first/_analyze?field=name' -d 'hidratant'

{
"tokens": [
  {
     "token": "hidratant",
     "start_offset": 3,
     "end_offset": 12,
     "type": "<ALPHANUM>",
     "position": 1
    }
  ]
 }

即为什么你得到第一个索引的结果。

对于第二个索引：

curl -XGET 'http://localhost:9200/second/_analyze?field=name' -d 'hidratant'

 {
 "tokens": [
  {
     "token": "hidratant",
     "start_offset": 3,
     "end_offset": 12,
     "type": "<ALPHANUM>",
     "position": 1
   }
  ]
 }

搜索时生成的令牌为hidratant，但在编制索引时为hidrat。这就是为什么你在第二种情况下得不到任何结果的原因。

匹配查询没有自定义分析器的结果

1 个答案: