Question

我想我应该用一个例子解释我的问题：

假设我已经使用同义词分析器创建了索引，并且我声明“笔记本电脑”，“手机”和“平板电脑”是类似的词，可以概括为“移动”：

PUT synonym
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 2,
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "whitespace",
            "filter": [
              "synonym"
            ]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms": [
              "phone, tablet, laptop => mobile"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "synonym" : {
            "properties" : {
                "field1" : { 
                  "type" : "text",
                  "analyzer": "synonym",
                  "search_analyzer": "synonym"
                }
            }
        }
  }
}

现在我正在创建一些文档：

PUT synonym/synonym/1
{
    "field1" : "phone"
}
PUT synonym/synonym/2
{
    "field1" : "tablet"
}
PUT synonym/synonym/3
{
    "field1" : "laptop"
}

现在，当我匹配laptop，tablet或phone的查询时，结果始终为：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "synonym",
        "_type": "synonym",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "field1": "tablet"
        }
      },
      {
        "_index": "synonym",
        "_type": "synonym",
        "_id": "1",
        "_score": 0.18232156,
        "_source": {
          "field1": "phone"
        }
      },
      {
        "_index": "synonym",
        "_type": "synonym",
        "_id": "3",
        "_score": 0.18232156,
        "_source": {
          "field1": "laptop"
        }
      }
    ]
  }
}

即使我搜索tablet，您也可以看到laptop的得分总是更高。

我知道那是因为我宣称它们是相似的词。

但是，我试图找出如何进行查询，以便具有搜索词的文档可以首先出现在结果列表中的相似词之前。

可以通过提升来完成，但必须采用更简单的方法..

Answer 1

Multi-fields给你救援。以两种方式索引field1，一个使用同义词分析器，另一个使用标准分析器。现在，您只需使用bool-should查询为field1（同义词）和field1.raw（标准）上的匹配添加分数。所以，你的映射应该是这样的：

PUT synonym
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 2,
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "whitespace",
            "filter": [
              "synonym"
            ]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms": [
              "phone, tablet, laptop => mobile"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "synonym": {
      "properties": {
        "field1": {
          "type": "text",
          "analyzer": "synonym",
          "search_analyzer": "synonym",
          "fields": {
            "raw": {
              "type": "text",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

您可以使用以下方式查询：

GET synonyms/_search?search_type=dfs_query_then_fetch
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "field1": "tablet"
          }
        },
        {
          "match": {
            "field1.raw": "tablet"
          }
        }
      ]
    }
  }
}

注意：我已使用search_type=dfs_query_then_fetch。由于您在3个分片上进行测试并且文档很少，因此您获得的分数并不是他们应该得到的分数。这是因为每个碎片计算频率。您可以在测试时使用dfs_query_then_fetch，但不鼓励生产。请参阅：https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch

在其同义词之前首先获取带有搜索词的文档[Elastic]

1 个答案: