假设我们在elasticsearch中有以下映射:
PUT /synonyms_test/
{
"settings": {
"index": {
"max_result_window": "5000000",
"queries.cache.enabled": true,
"requests.cache.enable": true
},
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
"USA, America, United States of America, The United States"
],
"tokenizer": "keyword"
}
},
"analyzer": {
"synonyms_analyzer": {
"filter": [
"synonym_filter",
"lowercase"
],
"tokenizer": "standard"
}
}
}
},
"mappings": {
"synonyms_index": {
"properties": {
"full_text": {
"type": "text",
"analyzer": "synonyms_analyzer",
"search_analyzer": "synonyms_analyzer"
}
}
}
}
}
以下是包含同义词的三个索引文档的列表。
POST synonyms_test/synonyms_index/1
{
"full_text": "Washington is capital of USA"
}
POST synonyms_test/synonyms_index/2
{
"full_text": "Washington is capital of the America"
}
POST synonyms_test/synonyms_index/3
{
"full_text": "Washington is capital of the United States of America"
}
使用多字同义词搜索不起作用。我期待"美利坚合众国"要在elasticsearch中转换为同义词,elasticsearch应匹配所有三个文档。
GET synonyms_test/synonyms_index/_search
{
"query": {
"match": {
"full_text": {
"query": "Washington United States of America",
"operator": "And"
}
}
}
}
如果我将synonym_filter中的tokenizer类型更改为标准,那么即使输入状态也会带来我不想要的所有三个结果。
答案 0 :(得分:0)
您应该使用同义词替换而不是合并。所以改变
"USA, America, United States of America, The United States"
to
"America, United States of America, The United States=>USA"
有关详细信息,请参阅指南 https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-word-synonyms.html