我是弹性搜索的新手,所以在downvoting或标记为重复之前,请先阅读问题。
我正在测试我在Ubuntu 16.04上安装的elasticsearch(v 2.4.6)中的同义词。我通过名为 synonym.txt 的文件给出了同义词,我已将其放在 config 目录中。我创建了一个索引 synonym_test ,如下所示 -
curl -XPOST localhost:9200/synonym_test/ -d '{
"settings": {
"analysis": {
"analyzer": {
"my_synonyms": {
"tokenizer": "whitespace",
"filter": ["lowercase","my_synonym_filter"]
}
},
"filter": {
"my_synonym_filter": {
"type": "synonym",
"ignore_case": true,
"synonyms_path" : "synonym.txt"
}
}
}
}
}'
索引包含两个字段 - id 和 some_text 。我使用自定义分析器配置字段 some_text ,如下所示 -
curl -XPUT localhost:9200/synonym_test/rulers/_mapping -d '{
"properties": {
"id": {
"type": "double"
},
"some_text": {
"type": "string",
"search_analyzer": "my_synonyms"
}
}
}'
然后我插入了一些数据 -
curl -XPUT localhost:9200/synonym_test/external/5 -d '{
"id" : "5",
"some_text":"apple is a fruit"
}'
curl -XPUT localhost:9200/synonym_test/external/7 -d '{
"id" : "7",
"some_text":"english is spoken in england"
}'
curl -XPUT localhost:9200/synonym_test/external/8 -d '{
"id" : "8",
"some_text":"Scotland Yard is a popular game."
}'
curl -XPUT localhost:9200/synonym_test/external/9 -d '{
"id" : "9",
"some_text":"bananas contain potassium"
}'
synonym.txt文件包含以下内容 -
"britain,england,scotland"
"fruit,bananas"
完成所有这些操作后,当我运行术语 fruit 的查询时(它还应该返回包含香蕉的文本,因为它们是文件中的同义词),我得到的文本只包含水果。
{
"took":117,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.8465736,
"hits":[
{
"_index":"synonym_test",
"_type":"external",
"_id":"5",
"_score":0.8465736,
"_source":{
"id":"5",
"some_text":"apple is a fruit"
}
}
]
}
}
我也试过以下链接,但似乎没有人帮助过我 - Synonym analyzer not working, Elasticsearch synonym analyzer not working,How to apply synonyms at query time instead of index time in Elasticsearch,how to configure the synonyms_path in elasticsearch以及许多其他链接。
那么,任何人都可以告诉我,如果我做错了吗?设置或同义词文件有什么问题吗?我希望同义词能够工作(查询时间),这样当我搜索一个术语时,我会得到与该术语相关的所有文档。
答案 0 :(得分:0)
请参考以下网址:Custom Analyzer,了解如何配置自定义分析器。 如果我们遵循上述文档中的指南,我们的架构将如下所示:
curl -XPOST localhost:9200/synonym_test/ -d '{
"settings": {
"analysis": {
"analyzer": {
"type": "custom"
"my_synonyms": {
"tokenizer": "whitespace",
"filter": ["lowercase","my_synonym_filter"]
}
},
"filter": {
"my_synonym_filter": {
"type": "synonym",
"ignore_case": true,
"synonyms_path" : "synonym.txt"
}
}
}
}
}
目前在我的Elasticsearch实例上工作。