Question

我在索引中有一个字段comment。我有三个文档，其值为comment字段为：

world and hello
hello world
world world world

我想根据单词的频率对文档进行排序。因此，如果我输入world hello，则输出应为：

    world world world
    hello world
    world and hello

world world world的频率为3（3 * world）
hello world的频率为2（1 * hello + 1 * world）
world and hello的频率为2（1 * world + 1 1 * hello）

我尝试使用以下查询进行此操作：

{
  "query" : {
    "bool" : {
      "should" : [ {          
        "match" : {
          "comment" : {
            "query": "hello world", 
            "boost":10.0
          }
        }
      }
      ]
    }
  }
}

但这给了我输出：

    hello world
    world world world
    world and hello

我在做什么错了？

Answer 1

您没有做错任何事情，弹性relevance scoring比您想象的要复杂。

例如，当我模拟您的示例时，我得到不同的结果，这可能是由于文档中提到的许多原因，例如字段长度，词条频率等。

在您的情况下，可以使用custom scoring来查找单词，尽管这将需要找到每个单词的词频，并使查询更为复杂。

如何仅根据学期频率获取文件？

1 个答案: