ElasticSearch同义词和单词分隔符分析器不兼容

时间:2015-06-10 22:23:16

标签: lucene elasticsearch

我有一个下面的映射文档,准确地说,它在索引和搜索时特别适用于单词分隔符分析器,仅用于对搜索字符串进行搜索时间分析的模型字段和同义词分析器。

映射

POST /stackoverflow
{
"settings":{
    "analysis":{
        "analyzer":{
            "keyword_analyzer":{
                "tokenizer":"keyword",
                "filter":[
                    "lowercase",
                    "asciifolding"
                ]
            },
            "synonym_analyzer":{
                "tokenizer":"standard",
                "filter":[
                    "lowercase",
                    "synonym"
                ],
                "expand":false,
                "ignore_case":true
            },
            "word_delimiter_analyzer":{
                "tokenizer":"whitespace",
                "filter":[
                    "lowercase",
                    "word_delimiter"

                ],
                "ignore_case":true
            }
        },
        "filter":{
            "synonym":{
                "type":"synonym",
                "synonyms_path":"synonyms.txt"
            },
            "word_delimiter":{
              "type":"word_delimiter",
              "generate_word_parts":true,
              "preserve_original": true
            }
        }
    }
},
"mappings":{
    "vehicles":{
        "dynamic":"false",
        "dynamic_templates":[
            {
                "no_index_template":{
                    "match":"*",
                    "mapping":{
                        "index":"no",
                        "include_in_all":false
                    }
                }
            }
        ],
        "_all":{
            "enabled":false
        },
        "properties":{
            "id":{
                "type":"long",
                "ignore_malformed":true
            },
            "model":{
                "type":"nested",
                "include_in_root":true,
                "properties":{
                    "label":{
                        "type":"string",
                        "analyzer": "word_delimiter_analyzer"
                    }
                }
            },
            "make":{
                "type":"String",
                "analyzer":"keyword_analyzer"
            }
        }
    }
}
}

和一些样本数据是

POST /stackoverflow/vehicles/6
{

    "make" : "chevrolet",
    "model" : {
       "label" : "Silverado 2500HD"
    }
}

以下是搜索查询

GET /stackoverflow/_search?explain
{  
   "from":0,
   "size":10,
   "query":{  
       "filtered":{  
         "query":{ 
         "multi_match":{  
            "query":"HD2500",
             "fields":[  
                "make","model.label"
              ],
            "type":"cross_fields","operator" : "OR",
            "analyzer" : "synonym_analyzer"
          }
       }
    }
   }
 }

上面的搜索查询不起作用,相反,如果我从搜索查询中删除synonym_analzer它完全正常。我真的不明白同义词分析器如何篡改结果背后的逻辑。

在我的synonym.txt文件中,我没有任何对HD2500的引用,并且所有同义词分析器都会通过空格分割令牌并将其转换为小写,然后尝试匹配同义词字符串,然后将其传递给字段级分析器,我很困惑它在哪里被打破。

非常感谢任何帮助

0 个答案:

没有答案