为什么基于边缘ngram elasticsearch映射的查询无法返回任何结果?

时间:2017-10-27 05:20:26

标签: elasticsearch elasticsearch-5

以下是映射和分析器设置。假设我正在索引“书”记录。书籍记录上的多个字段(例如,出版商和标签)是字符串的数组(例如,[“随机房屋”,“macmillan”]),字段“名称”采用诸如“蓝色”的单个字符串。

{
   "state": "open",
   "settings": {
      "index": {
         "number_of_shards": "5",
         "provided_name": "autocomplete_index",
         "creation_date": "1509080632268",
         "analysis": {
            "filter": {
               "edge_ngram": {
                  "token_chars": [
                     "letter",
                     "digit"
                  ],
                  "min_gram": "1",
                  "type": "edgeNGram",
                  "max_gram": "15"
               },
               "english_stemmer": {
                  "name": "possessive_english",
                  "type": "stemmer"
               }
            },
            "analyzer": {
               "keyword_analyzer": {
                  "filter": [
                     "lowercase",
                     "english_stemmer"
                  ],
                  "type": "custom",
                  "tokenizer": "standard"
               },
               "autocomplete_analyzer": {
                  "filter": [
                     "lowercase",
                     "asciifolding",
                     "english_stemmer",
                     "edge_ngram"
                  ],
                  "type": "custom",
                  "tokenizer": "standard"
               }
            }
         },
         "number_of_replicas": "1",
         "uuid": "SSTzdTNFStaSiIBu-l3q5w",
         "version": {
            "created": "5060299"
         }
      }
   },
   "mappings": {
      "autocomplete_mapping": {
         "properties": {
            "publishers": {
               "type": "text",
               "fields": {
                  "keyword": {
                     "ignore_above": 256,
                     "type": "keyword"
                  }
               }
            },
            "name": {
               "type": "text",
               "fields": {
                  "keyword": {
                     "ignore_above": 256,
                     "type": "keyword"
                  }
               }
            },
            "tags": {
               "type": "text",
               "fields": {
                  "keyword": {
                     "ignore_above": 256,
                     "type": "keyword"
                  }
               }
            }
         }
      }
   },
   "aliases": [],
   "primary_terms": {
      "0": 1,
      "1": 1,
      "2": 1,
      "3": 1,
      "4": 1
   },
   "in_sync_allocations": {
      "0": [
         "GXwYiYuWQ16wgxCrpXShJQ"
      ],
      "1": [
         "Do_49lZ4QmyNEYUK_QJfEQ"
      ],
      "2": [
         "vWZ_PjsLSGSVh130C5EvYQ"
      ],
      "3": [
         "5CLINaFJQbqVcZLVOsSNWQ"
      ],
      "4": [
         "hy3JYfmuR7e8fc-anu-heA"
      ]
   }
}

如果我执行查询,例如:

curl -XGET 'localhost:9200/autocomplete_index/_search?size=5' -d '
{
"query" : {
    "multi_match" : {
      "query": "b",
      "analyzer": "keyword",
      "fields": ["_all"]
    }
  }
}'

我得到0结果。我必须在查询字段中输入完整的单词“blue”才能得到匹配。

此外,当我进行“_analyze”时,我得到:

curl -XGET 'localhost:9200/products_autocomplete_dev/_analyze?pretty' -H 'Content-Type: application/json' -d'
{
  "analyzer": "autocomplete_analyzer",
  "field": "name",
  "text": "b"
}
'

{
  "tokens" : [
    {
      "token" : "b",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

我希望至少可以获得诸如“b”,“bl”,“blu”和“blue”等令牌。

以下是索引中的示例文档:

{
  "_index" : "autocomplete_index",
  "_type" : "autocomplete_mapping",
  "_id" : "145",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name": "Blue",
    "publishers" : [
      "macmillan",
      "Penguin"
    ],
    "themes" : [
      "Butterflies", "Mammals"
    ]
  }
}

我做错了什么?

1 个答案:

答案 0 :(得分:0)

有这么多错误的东西,我建议你仔细阅读有关分析仪的文档。希望你不要介意我这样做。

首先,如果您想测试分析仪,也不要指定字段名称,只需指定文本和分析仪本身:

GET /my_index/_analyze?pretty
{
  "analyzer": "autocomplete_analyzer",
  "text": "blue"
}

如果您定义了自定义分析器,Elasticsearch应该如何知道特定字段正在使用该分析器?定义分析器与使用它的特定字段不同。所以:

    "name": {
      "type": "text",
-->   "analyzer": "autocomplete_analyzer",
      "fields": {
        "keyword": {
          "ignore_above": 256,
          "type": "keyword"
        }
      }
    }

_all字段也是如此:默认情况下,它使用standard分析器,除非您更改它,否则它将使用相同的内容:

  "mappings": {
    "autocomplete_mapping": {
      "_all": {
        "analyzer": "autocomplete_analyzer"
      }, 
      "properties": {
        "publishers": {
          "type": "text", 
          "fields": {
.....