在其同义词之前首先获取带有搜索词的文档[Elastic]

时间:2018-01-04 07:53:54

标签: elasticsearch search elasticsearch-5 analyzer synonym

我想我应该用一个例子解释我的问题:

假设我已经使用同义词分析器创建了索引,并且我声明“笔记本电脑”,“手机”和“平板电脑”是类似的词,可以概括为“移动”:

PUT synonym
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 2,
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "whitespace",
            "filter": [
              "synonym"
            ]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms": [
              "phone, tablet, laptop => mobile"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "synonym" : {
            "properties" : {
                "field1" : { 
                  "type" : "text",
                  "analyzer": "synonym",
                  "search_analyzer": "synonym"
                }
            }
        }
  }
}

现在我正在创建一些文档:

PUT synonym/synonym/1
{
    "field1" : "phone"
}
PUT synonym/synonym/2
{
    "field1" : "tablet"
}
PUT synonym/synonym/3
{
    "field1" : "laptop"
}

现在,当我匹配laptoptabletphone的查询时,结果始终为:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "synonym",
        "_type": "synonym",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "field1": "tablet"
        }
      },
      {
        "_index": "synonym",
        "_type": "synonym",
        "_id": "1",
        "_score": 0.18232156,
        "_source": {
          "field1": "phone"
        }
      },
      {
        "_index": "synonym",
        "_type": "synonym",
        "_id": "3",
        "_score": 0.18232156,
        "_source": {
          "field1": "laptop"
        }
      }
    ]
  }
}

即使我搜索tablet,您也可以看到laptop的得分总是更高。

我知道那是因为我宣称它们是相似的词。

但是,我试图找出如何进行查询,以便具有搜索词的文档可以首先出现在结果列表中的相似词之前。

可以通过提升来完成,但必须采用更简单的方法..

1 个答案:

答案 0 :(得分:2)

Multi-fields给你救援。 以两种方式索引field1,一个使用同义词分析器,另一个使用标准分析器。 现在,您只需使用bool-should查询为field1(同义词)和field1.raw(标准)上的匹配添加分数。 所以,你的映射应该是这样的:

PUT synonym
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 2,
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "whitespace",
            "filter": [
              "synonym"
            ]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms": [
              "phone, tablet, laptop => mobile"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "synonym": {
      "properties": {
        "field1": {
          "type": "text",
          "analyzer": "synonym",
          "search_analyzer": "synonym",
          "fields": {
            "raw": {
              "type": "text",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

您可以使用以下方式查询:

GET synonyms/_search?search_type=dfs_query_then_fetch
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "field1": "tablet"
          }
        },
        {
          "match": {
            "field1.raw": "tablet"
          }
        }
      ]
    }
  }
}

注意:我已使用search_type=dfs_query_then_fetch。由于您在3个分片上进行测试并且文档很少,因此您获得的分数并不是他们应该得到的分数。这是因为每个碎片计算频率。您可以在测试时使用dfs_query_then_fetch,但不鼓励生产。请参阅:https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch