如何在Elasticsearch中将单词映射到另一个单词?假设我有以下数据文档
{
"carName" : "Porche"
"review": " this car is so awesome"
}
现在,当我搜索好/奇妙等时,它应该映射到“真棒”。 有什么方法可以在elasticsearch中做到这一点吗?
答案 0 :(得分:1)
是的,您可以使用synonym token filter来实现此目的。
首先,您需要在索引中定义新的自定义分析器,并在映射中使用该分析器。
curl -XPUT localhost:9200/cars -d '{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"synonyms"
]
}
},
"filter": {
"synonyms": {
"type": "synonym",
"synonyms": [
"good, awesome, fantastic"
]
}
}
}
},
"mappings": {
"car": {
"properties": {
"carName": {
"type": "string"
},
"review": {
"type": "string",
"analyzer": "my_analyzer"
}
}
}
}
}'
您可以直接在设置中添加任意数量的同义词,也可以使用synonyms_path
属性在设置中引用的单独文件中添加。
然后我们可以将您的示例文档编入索引:
curl -XPUT localhost:9200/cars/car/1 -d '{
"carName": "Porche",
"review": " this car is so awesome"
}'
当synonyms
令牌过滤器启动时,它会将令牌good
和fantastic
与awesome
一起编入索引,以便您可以通过这些令牌搜索并查找该文档。具体而言,分析句子this car is so awesome
...
curl -XGET 'localhost:9200/cars/_analyze?analyzer=my_analyzer&pretty' -d 'this car is so awesome'
...将产生以下令牌(参见最后三个令牌)
{
"tokens" : [ {
"token" : "this",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "car",
"start_offset" : 5,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "is",
"start_offset" : 9,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "so",
"start_offset" : 12,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 4
}, {
"token" : "good",
"start_offset" : 15,
"end_offset" : 22,
"type" : "SYNONYM",
"position" : 5
}, {
"token" : "awesome",
"start_offset" : 15,
"end_offset" : 22,
"type" : "SYNONYM",
"position" : 5
}, {
"token" : "fantastic",
"start_offset" : 15,
"end_offset" : 22,
"type" : "SYNONYM",
"position" : 5
} ]
}
最后,您可以像这样搜索并检索文档:
curl -XGET localhost:9200/cars/car/_search?q=review:good