Elasticsearch:simple_query_string和多词同义词

时间:2019-04-03 15:29:44

标签: elasticsearch full-text-search

我有一个带有以下search_analyzer的字段:

"name_search_en" : {
   "filter" : [
     "english_possessive_stemmer",
     "lowercase",
     "name_synonyms_en",
     "english_stop",
     "english_stemmer",
     "asciifolding"
   ],
   "tokenizer" : "standard"
}

name_synonyms_en是一个看起来像这样的synonym_graph

"name_synonyms_en" : {
  "type" : "synonym_graph",
   "synonyms" : [
      "beach bag => straw bag,beach bag",
      "bicycle,bike"
    ]
 }

运行以下multi_match查询可以正确应用同义词

{
  "query": {
    "multi_match": {
      "query": "beach bag",
      "auto_generate_synonyms_phrase_query": false,
      "type": "cross_fields",
      "fields": [
        "brand.en-US^1.0",
        "name.en-US^1.0"
      ]
    }
  }
}

这是_validate解释输出。如预期的那样,原始查询中同时出现了沙滩包和草编包:

"explanations" : [
{
  "index" : "d7598351-311f-4844-bb91-4f26c9f538f3",
  "valid" : true,
  "explanation" : "+((((+name.en-US:straw +name.en-US:bag) (+name.en-US:beach +name.en-US:bag))) | (brand.en-US:beach brand.en-US:bag)) #DocValuesFieldExistsQuery [field=_primary_term]"
}
]

在下面的simple_query_string中我希望如此

{
  "query": {
    "simple_query_string": {
      "query": "beach bag",
      "auto_generate_synonyms_phrase_query": false,
      "fields": [
        "brand.en-US^1.0",
        "name.en-US^1.0"
      ]
    }
  }
}

但是原始查询中不存在草袋同义词

"explanations" : [
{
  "index" : "d7598351-311f-4844-bb91-4f26c9f538f3",
  "valid" : true,
  "explanation" : "+((name.en-US:beach | brand.en-US:beach)~1.0 (name.en-US:bag | brand.en-US:bag)~1.0) #DocValuesFieldExistsQuery [field=_primary_term]"
}
]

该问题似乎仅与多词同义词有关。如果我搜索“自行车”,则查询中正确显示了自行车的同义词

"explanations" : [
{
  "index" : "d7598351-311f-4844-bb91-4f26c9f538f3",
  "valid" : true,
  "explanation" : "+(Synonym(name.en-US:bicycl name.en-US:bike) | brand.en-US:bike)~1.0 #DocValuesFieldExistsQuery [field=_primary_term]"
}
]

这是否是预期行为(此查询不支持多词同义词)?

1 个答案:

答案 0 :(得分:0)

默认情况下,simple_query_string启用了WHITESPACE标志。输入文本被标记化。这就是同义词过滤器无法正确处理多字的原因。该查询禁用所有使多词同义词按预期工作的标志

{
  "query": {
    "simple_query_string": {
      "query": "beach bag",
      "auto_generate_synonyms_phrase_query": false,
      "flags": "NONE", 
      "fields": [
        "brand.en-US^1.0",
        "name.en-US^1.0"
      ]
    }
  }
}

不幸的是,此参数不能与minimum_should_match参数配合使用。完整的讨论和更多细节可以在这里https://discuss.elastic.co/t/simple-query-string-and-multi-terms-synonyms/174780