Question

在嵌套字段上运行匹配查询时，每个嵌套文档的相关性分数是基于所有根文档中的所有嵌套文档计算的，还是仅根据单个根文档下的嵌套文档计算的？基本上，当计算TF / IDF时，用于IDF的集合的范围是什么？

这是一个嵌套文档：

PUT /channels_index
{
  "mappings": {
    "channel": {
      "properties": {
        "username": { "type": "string" },
        "posts": {
          "type": "nested", 
          "properties": {
            "link":    { "type": "string" },
            "caption": { "type": "string" },
          }
        }
      }
    }
  }
}

以下是查询：

GET channels/_search
{
  "query": {
    "nested": {
      "path": "posts",
      "query": {
        "match": {
          "posts.caption": "adidas"
        }
      },
      "inner_hits": {}
    }
  }
}

然而，在我的结果中，即使第二个文档的内部命中最高分数较高，第一个文档的根分数也会更高。

{
  "hits": {
    "total": 2,
    "max_score": 4.3327584,
    "hits": [
      {
        "_index": "channels",
        "_type": "channel",
        "_id": "1",
        "_score": 4.3327584,
        "_source": {
          "username": "user1",
          "posts": [...]
        },
        "inner_hits": {
          "posts": {
            "hits": {
              "total": 2,
              "max_score": 5.5447335,
              "hits": [...]
            }
          }
        }
      },
      {
        "_index": "channels",
        "_type": "channel",
        "_id": "2",
        "_score": 4.2954993,
        "_source": {
          "username": "user2",
          "posts": [...]
        },
        "inner_hits": {
          "posts": {
            "hits": {
              "total": 13,
              "max_score": 11.446381,
              "hits": [...]
            }
          }
        }
      }
    ]
  }
}

Answer 1

在对我的查询运行解释之后，我可以看到内部命中的TF / IDF分数确实是使用从所有根文档中的嵌套文档计算的IDF。

对于根文档评分，嵌套文档的默认评分模式是对评分进行平均。如果我想使用嵌套文档的最大分数，我可以通过定义score_mode来设置它。下面的查询显示了如何在文档上运行解释以及设置不同的分数模式。

GET channels/channel/1/_explain
{
  "query": {
    "nested": {
      "path": "posts",
      "score_mode": "max", 
      "query": {
        "match": {
          "posts.caption": "adidas"
        }
      },
      "inner_hits": {}
    }
  }
}

如何在Elasticsearch中计算嵌套文档相关性得分（TF / IDF）？

1 个答案: