使用elasticsearch搜索我们的文档时,我们发现当我们搜索“wave board”时,我们得不到好的结果,因为包含“waveboard”的文档不在搜索结果的顶部。谷歌做了这种“术语组合”。在ES中有一种简单的方法吗?
答案 0 :(得分:0)
找到了一个很好的解决方案:使用""创建一个带有木瓦过滤器的自定义分析器作为标记分隔符并在查询中使用它(使用bool查询与标准查询结合)
答案 1 :(得分:0)
要在分析时执行此操作,您还可以使用所谓的“解压缩” 令牌过滤器。这是一个将文本“catdogmouse”解压缩为的示例 代币“cat”,“dog”和“mouse”:
POST /decom
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"decom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["decom_filter"]
}
},
"filter": {
"decom_filter": {
"type": "dictionary_decompounder",
"word_list": ["cat", "dog", "mouse"]
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"body": {
"type": "string",
"analyzer": "decom_analyzer"
}
}
}
}
}
然后你可以看到它们如何应用于某些术语:
POST /decom/_analyze?field=body&pretty
racecatthings
{
"tokens" : [ {
"token" : "racecatthings",
"start_offset" : 1,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "cat",
"start_offset" : 1,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 1
} ]
}
另一个:(你应该可以推断这个以分开“waveboard” 进入“波浪”和“董事会”)
POST /decom/_analyze?field=body&pretty
catdogmouse
{
"tokens" : [ {
"token" : "catdogmouse",
"start_offset" : 1,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "cat",
"start_offset" : 1,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "dog",
"start_offset" : 1,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "mouse",
"start_offset" : 1,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 1
} ]
}