我想我应该用一个例子解释我的问题:
假设我已经使用同义词分析器创建了索引,并且我声明“笔记本电脑”,“手机”和“平板电脑”是类似的词,可以概括为“移动”:
PUT synonym
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"phone, tablet, laptop => mobile"
]
}
}
}
}
},
"mappings": {
"synonym" : {
"properties" : {
"field1" : {
"type" : "text",
"analyzer": "synonym",
"search_analyzer": "synonym"
}
}
}
}
}
现在我正在创建一些文档:
PUT synonym/synonym/1
{
"field1" : "phone"
}
PUT synonym/synonym/2
{
"field1" : "tablet"
}
PUT synonym/synonym/3
{
"field1" : "laptop"
}
现在,当我匹配laptop
,tablet
或phone
的查询时,结果始终为:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.2876821,
"hits": [
{
"_index": "synonym",
"_type": "synonym",
"_id": "2",
"_score": 0.2876821,
"_source": {
"field1": "tablet"
}
},
{
"_index": "synonym",
"_type": "synonym",
"_id": "1",
"_score": 0.18232156,
"_source": {
"field1": "phone"
}
},
{
"_index": "synonym",
"_type": "synonym",
"_id": "3",
"_score": 0.18232156,
"_source": {
"field1": "laptop"
}
}
]
}
}
即使我搜索tablet
,您也可以看到laptop
的得分总是更高。
我知道那是因为我宣称它们是相似的词。
但是,我试图找出如何进行查询,以便具有搜索词的文档可以首先出现在结果列表中的相似词之前。
可以通过提升来完成,但必须采用更简单的方法..
答案 0 :(得分:2)
Multi-fields给你救援。
以两种方式索引field1,一个使用同义词分析器,另一个使用标准分析器。
现在,您只需使用bool-should查询为field1
(同义词)和field1.raw
(标准)上的匹配添加分数。
所以,你的映射应该是这样的:
PUT synonym
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"phone, tablet, laptop => mobile"
]
}
}
}
}
},
"mappings": {
"synonym": {
"properties": {
"field1": {
"type": "text",
"analyzer": "synonym",
"search_analyzer": "synonym",
"fields": {
"raw": {
"type": "text",
"analyzer": "standard"
}
}
}
}
}
}
}
您可以使用以下方式查询:
GET synonyms/_search?search_type=dfs_query_then_fetch
{
"query": {
"bool": {
"should": [
{
"match": {
"field1": "tablet"
}
},
{
"match": {
"field1.raw": "tablet"
}
}
]
}
}
}
注意:我已使用search_type=dfs_query_then_fetch
。由于您在3个分片上进行测试并且文档很少,因此您获得的分数并不是他们应该得到的分数。这是因为每个碎片计算频率。您可以在测试时使用dfs_query_then_fetch
,但不鼓励生产。请参阅:https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch