Question

我正在使用默认分析器和索引。所以，假设我有这个简单的映射：

"question": {
    "properties": {
        "title": {
            "type": "string"
        },
        "answer": {
            "properties": {
                "text": {
                    "type": "string"
                }
            }
        }
    }
}

（这是一个例子。抱歉，如果它有拼写错误）

现在，我执行以下搜索。

GET _search
{
    "query": {
        "query_string": {
            "query": "yes correct",
            "fields": ["answer.text"]
        }
    }
}

结果将获得text值，例如“是的正确”。（doc id value 1）高于简单的“yes correct”（没有句点，doc id值181）。两个匹配具有相同的分数值，但命中数组首先列出具有较小doc ID的那个。我理解默认索引选项包括按doc id排序，那么如何排除那个属性并仍然使用其余的默认选项？

我没有设置任何自定义分析器，所以一切都使用Elasticsearch 2.0的默认值。

Answer 1

这可能是Dis Max Query

的用例

生成由其生成的文档的并集的查询子查询，并为每个文档评分最高分数该文档由任何子查询生成，加上打破平局任何其他匹配子查询的增量。

因此，您需要将答案得分作为完全匹配，并给予最高的提升。您必须使用自定义分析器。这就是你的映射：

PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_keyword": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "asciifolding",
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "question": {
      "properties": {
        "title": {
          "type": "string"
        },
        "answer": {
          "type": "object",
          "properties": {
            "text": {
              "type": "string",
              "analyzer": "my_keyword",
              "fields": {
                "stemmed": {
                  "type": "string",
                  "analyzer": "standard"
                }
              }
            }
          }
        }
      }
    }
  }
}

您的测试数据：

PUT /test/question/1
{
  "title": "title nr1",
  "answer": [
    {
      "text": "yes correct."
    }
  ]
}

PUT /test/question/2
{
  "title": "title nr2",
  "answer": [
    {
      "text": "yes correct"
    }
  ]
}

现在，当您使用此类查询查询"yes correct."时：

POST /test/_search
{
  "query": {
    "dis_max": {
      "tie_breaker": 0.7,
      "boost": 1.2,
      "queries": [
        {
          "match": {
            "answer.text": {
              "query": "yes correct.",
              "type": "phrase"
            }
          }
        },
        {
          "match": {
            "answer.text.stemmed": {
              "query": "yes correct.",
              "operator": "and"
            }
          }
        }
      ]
    }
  }
}

你得到这个输出：

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.37919715,
      "hits": [
         {
            "_index": "test",
            "_type": "question",
            "_id": "1",
            "_score": 0.37919715,
            "_source": {
               "title": "title nr1",
               "answer": [
                  {
                     "text": "yes correct."
                  }
               ]
            }
         },
         {
            "_index": "test",
            "_type": "question",
            "_id": "2",
            "_score": 0.11261705,
            "_source": {
               "title": "title nr2",
               "answer": [
                  {
                     "text": "yes correct"
                  }
               ]
            }
         }
      ]
   }
}

如果您运行的是相同的查询而没有尾随点，然后变为"yes correct"，您就会得到以下结果：

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.37919715,
      "hits": [
         {
            "_index": "test",
            "_type": "question",
            "_id": "2",
            "_score": 0.37919715,
            "_source": {
               "title": "title nr2",
               "answer": [
                  {
                     "text": "yes correct"
                  }
               ]
            }
         },
         {
            "_index": "test",
            "_type": "question",
            "_id": "1",
            "_score": 0.11261705,
            "_source": {
               "title": "title nr1",
               "answer": [
                  {
                     "text": "yes correct."
                  }
               ]
            }
         }
      ]
   }
}

希望这是你正在寻找的。

顺便说一句，我建议在执行文字搜索时始终使用Match查询。取自文档：

与query_string / field的比较

匹配查询系列没有通过＆＃34;查询解析＆＃34;处理。它不支持字段名称前缀，通配符或其他＆＃34;高级＆＃34; 特征。出于这个原因，它失败的可能性非常小/非存在，它提供了一个很好的行为，当谈到公正分析并运行该文本作为查询行为（这通常是一个文本搜索框确实）。此外，phrase_prefix类型可以提供很棒＆＃34;当你输入＆＃34;自动加载搜索结果的行为。

Answer 2

Elasticsearch或者说Lucene得分没有考虑到令牌的相对位置。它使用3种不同的标准来做同样的事情

期限频率 - 搜索字词所在的频率文件
反向文档频率 - 搜索词的出现次数在整个数据库中。出现的越多，就越常见是搜索词，而不是它在搜索中的重要性
字段长度标准化 - 目标中存在的令牌数字段。

您可以详细了解here。

如何使Elasticsearch排序/首选匹配完全匹配的字符串

2 个答案: