我正在尝试对弹性搜索同义词进行简单测试而没有成功,这就是我到目前为止
POST /mysearch
{
"settings" : {
"number_of_shards" : 5,
"number_of_replicas" : 0,
"analysis": {
"filter" : {
"my_ascii_folding" : {
"type" : "asciifolding",
"preserve_original" : true
},
"my_stopwords": {
"type": "stop",
"stopwords": [ ]
},
"mysynonym" : {
"type" : "synonym",
"synonyms" : [
"foo => bar"
]
}
},
"char_filter": {
"my_htmlstrip": {
"type": "html_strip"
}
},
"analyzer": {
"index_text_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "my_stopwords", "my_ascii_folding" ]
},
"index_html_analyzer":{
"type": "custom",
"tokenizer": "standard",
"char_filter": "my_htmlstrip",
"filter": [ "lowercase", "my_stopwords", "my_ascii_folding" ]
},
"search_text_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [ "mysynonym", "lowercase", "my_stopwords" ]
}
}
}
},
"mappings" : {
"news" : {
"_source" : { "enabled" : true },
"_all" : {"enabled" : false},
"properties" : {
"name" : { "type" : "string", "index" : "analyzed", "store": "yes" , "analyzer": "index_text_analyzer" , "search_analyzer": "search_text_analyzer" }
}
}
}
}
添加一些文档
POST /mysearch/news
{
"name":"foo kar"
}
POST /mysearch/news
{
"name":"bar kar"
}
进行搜索
POST /mysearch/_search?q=name:foo
{
}
给我的结果与foo
匹配,而不是bar
,为什么?
答案 0 :(得分:3)
我认为你做错了,原因如下:
foo => bar
?这意味着您使用foo
替换 bar
,而如果它们是同义词,则应将它们都编入索引。所以,我会改用foo,bar
。让我举个例子:假设你索引foo kar
。由于bar
是foo
的同义词,您也希望将其同义词编入索引,以便索引包含foo
,bar
,{{1 }}。这样,如果您搜索kar
或foo
该文档将在索引中找到,即使原始文本不包含bar
。
话虽如此,我建议如下:
bar
或者,如果您不想索引同义词,只需将原始文本编入索引,然后仅在搜索时搜索同义词,请执行以下更改:
POST /mysearch
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0,
"analysis": {
"filter": {
"my_ascii_folding": {
"type": "asciifolding",
"preserve_original": true
},
"my_stopwords": {
"type": "stop",
"stopwords": []
},
"mysynonym": {
"type": "synonym",
"synonyms": [
"foo,bar"
]
}
},
"char_filter": {
"my_htmlstrip": {
"type": "html_strip"
}
},
"analyzer": {
"index_text_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stopwords",
"my_ascii_folding"
]
},
"index_html_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": "my_htmlstrip",
"filter": [
"lowercase",
"my_stopwords",
"my_ascii_folding"
]
},
"search_text_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"mysynonym",
"lowercase",
"my_stopwords"
]
}
}
}
},
"mappings": {
"news": {
"_source": {
"enabled": true
},
"_all": {
"enabled": false
},
"properties": {
"name": {
"type": "string",
"index": "analyzed",
"store": "yes",
"analyzer": "search_text_analyzer"
}
}
}
}
}
因为,如上所述,您将"synonyms": ["foo,bar"]
替换为foo
,否则bar
上述两项更改将导致您的文本按原样编制索引(没有同义词),但在搜索时,当您要搜索"index_analyzer": "index_text_analyzer",
"search_analyzer": "search_text_analyzer"
时,Elasticsearch将搜索其同义词: foo
或foo
。