Ewordsearch中的多字同义词全文文档

时间:2018-05-28 13:15:50

标签: elasticsearch

假设我们在elasticsearch中有以下映射:

  PUT /synonyms_test/
  {
     "settings": {
        "index": {
           "max_result_window": "5000000",
           "queries.cache.enabled": true,
           "requests.cache.enable": true
        },
        "analysis": {
           "filter": {
              "synonym_filter": {
                 "type": "synonym",
                 "synonyms": [
                    "USA, America, United States of America, The United States"
                 ],
                 "tokenizer": "keyword"
              }
           },
           "analyzer": {
              "synonyms_analyzer": {
                 "filter": [
                    "synonym_filter",
                    "lowercase"
                 ],
                 "tokenizer": "standard"
              }
           }
        }
     },
     "mappings": {
        "synonyms_index": {
           "properties": {
              "full_text": {
                 "type": "text",
                 "analyzer": "synonyms_analyzer",
                 "search_analyzer": "synonyms_analyzer"
              }
           }
        }
     }
  }

以下是包含同义词的三个索引文档的列表。

  POST synonyms_test/synonyms_index/1
  {
     "full_text": "Washington is capital of USA"
  }

  POST synonyms_test/synonyms_index/2
  {
     "full_text": "Washington is capital of the America"
  }

  POST synonyms_test/synonyms_index/3
  {
     "full_text": "Washington is capital of the United States of America"
  }

使用多字同义词搜索不起作用。我期待"美利坚合众国"要在elasticsearch中转换为同义词,elasticsearch应匹配所有三个文档。

  GET synonyms_test/synonyms_index/_search
  {
     "query": {
        "match": {
           "full_text": {
              "query": "Washington United States of America",
              "operator": "And"
           }
        }
     }
  }

如果我将synonym_filter中的tokenizer类型更改为标准,那么即使输入状态也会带来我不想要的所有三个结果。

1 个答案:

答案 0 :(得分:0)

您应该使用同义词替换而不是合并。所以改变

"USA, America, United States of America, The United States" 
to
"America, United States of America, The United States=>USA"

有关详细信息,请参阅指南 https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-word-synonyms.html