Question

从https://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html我们有以下函数来计算分数。

score(q,d)  =  
            queryNorm(q)  
          · coord(q,d)    
          · ∑ (           
                tf(t in d)   
              · idf(t)²      
              · t.getBoost() 
              · norm(t,d)    
            ) (t in q)

然而，当看下面的例子时，解释似乎存在一些不一致。 1）说明只显示idf而不是idf²。

2）协调因素在哪里？

3）从解释来看，得分似乎是通过以下公式计算的：（tf * idf * fieldNorm）+（子句数* boost * queryNorm）

索引文档：

PUT test/type/1
{
  "text": "a b c"
}

查询：

GET test/type/_search
{
  "explain":"true",
  "query": {
    "match": {
      "text": "a"
    }
  }
}

结果：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.15342641,
    "hits": [
      {
        "_shard": 3,
        "_node": "5QvbXVlRSku-p_g81ZXpjQ",
        "_index": "test",
        "_type": "type",
        "_id": "1",
        "_score": 0.15342641,
        "_source": {
          "text": "a b c"
        },
        "_explanation": {
          "value": 0.15342641,
          "description": "sum of:",
          "details": [
            {
              "value": 0.15342641,
              "description": "weight(text:a in 0) [PerFieldSimilarity], result of:",
              "details": [
                {
                  "value": 0.15342641,
                  "description": "fieldWeight in 0, product of:",
                  "details": [
                    {
                      "value": 1,
                      "description": "tf(freq=1.0), with freq of:",
                      "details": [
                        {
                          "value": 1,
                          "description": "termFreq=1.0",
                          "details": []
                        }
                      ]
                    },
                    {
                      "value": 0.30685282,
                      "description": "idf(docFreq=1, maxDocs=1)",
                      "details": []
                    },
                    {
                      "value": 0.5,
                      "description": "fieldNorm(doc=0)",
                      "details": []
                    }
                  ]
                }
              ]
            },
            {
              "value": 0,
              "description": "match on required clause, product of:",
              "details": [
                {
                  "value": 0,
                  "description": "# clause",
                  "details": []
                },
                {
                  "value": 3.2588913,
                  "description": "_type:type, product of:",
                  "details": [
                    {
                      "value": 1,
                      "description": "boost",
                      "details": []
                    },
                    {
                      "value": 3.2588913,
                      "description": "queryNorm",
                      "details": []
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

Answer 1

您缺少一个idf案例，因为您的查询中只有一个子句。 idf的第二次乘法来自查询权重，在这样一个简单的查询中你不会看到它。第二个idf被querynorm取消了。 Querynorm（简化一点）是：1 / √ (∑ idf^2)，单个术语变为：1 / idf，因此查询权重变为idf / idf。所有这些都是隐含的，只有一个子句，没有什么可以权衡术语，所以查询权重不需要计算。
此查询中只有一个术语，因此无需考虑。也就是说，coord = overlap / maxOverlap = 1/1 = 1
不知道这是从哪里来的。我相信你会被抛出一点_type个查询。似乎是添加到搜索特定Elasticsearch类型的必需术语。请注意，此查询的分数为零。因此，所有匹配都必须符合指定的_type，但该术语根本不应影响得分。

如果您想在评分算法中查看所有工作，您需要使用更接近实际条件的测试数据集和查询。此测试具有单个简单文档和单个简单查询。在这种情况下，是的算法看起来很简单：

得分= tf * idf * fieldNorm = 1 * 0.30685282 * .5

但是您没有看到coord，查询规范或整体wuery权重计算，因为您的查询过于简单。您没有看到特别有意义的idf（或tf），因为只有一个文档和一个匹配。你没有看到总和，因为你对一个术语有一次命中，所以没有什么要总和的。该算法主要用于从较大的数据集中产生有意义的分数。

Elasticsearch评分解释与实际评分功能不一样

1 个答案: