我有一个应该返回具有相似兴趣的个人资料的查询。问题是,更多匹配术语的文档得分更低。
在bool
查询中,我should
与interests = ['games', 'music', 'sport']
interests = ['games']
的文件得分为0.14981213
interests = ['games', 'music']
的文档得分为0.11516824。
为什么呢?我正在使用AWS elasticsearch,v.2.3.2。
查询如下:
{
"explain": true,
"from": 0,
"query": {
"bool": {
"filter": [
{
"bool": {
"must_not": [
{
"term": {
"id": 3918
}
}
]
}
}
],
"should": [
{
"terms": {
"interests": [
"games",
"music",
"sport"
]
}
}
]
}
},
"size": 10
}
然后,结果我得到了:
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [
{
"_explanation": {
"description": "sum of:",
"details": [
{
"description": "match on required clause, product of:",
"details": [
{
"description": "# clause",
"details": [],
"value": 0.0
},
{
"description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:",
"details": [
{
"description": "boost",
"details": [],
"value": 1.0
},
{
"description": "queryNorm",
"details": [],
"value": 0.4494364
}
],
"value": 0.4494364
}
],
"value": 0.0
},
{
"description": "product of:",
"details": [
{
"description": "sum of:",
"details": [
{
"description": "weight(interests:games in 1) [PerFieldSimilarity], result of:",
"details": [
{
"description": "score(doc=1,freq=1.0), product of:",
"details": [
{
"description": "queryWeight, product of:",
"details": [
{
"description": "idf(docFreq=2, maxDocs=3)",
"details": [],
"value": 1.0
},
{
"description": "queryNorm",
"details": [],
"value": 0.4494364
}
],
"value": 0.4494364
},
{
"description": "fieldWeight in 1, product of:",
"details": [
{
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"description": "termFreq=1.0",
"details": [],
"value": 1.0
}
],
"value": 1.0
},
{
"description": "idf(docFreq=2, maxDocs=3)",
"details": [],
"value": 1.0
},
{
"description": "fieldNorm(doc=1)",
"details": [],
"value": 1.0
}
],
"value": 1.0
}
],
"value": 0.4494364
}
],
"value": 0.4494364
}
],
"value": 0.4494364
},
{
"description": "coord(1/3)",
"details": [],
"value": 0.33333334
}
],
"value": 0.14981213
}
],
"value": 0.14981213
},
"_id": "3917",
"_index": "test_44024988_profiles",
"_node": "urWXg5KhREyffYielaa6Rw",
"_score": 0.14981213,
"_shard": 2,
"_source": {
"full_name": "Bob Doe",
"id": 3916,
"interests": [
"games"
],
"user_id": 3917
},
"_type": "profile_document"
},
{
"_explanation": {
"description": "sum of:",
"details": [
{
"description": "match on required clause, product of:",
"details": [
{
"description": "# clause",
"details": [],
"value": 0.0
},
{
"description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:",
"details": [
{
"description": "boost",
"details": [],
"value": 1.0
},
{
"description": "queryNorm",
"details": [],
"value": 0.9173473
}
],
"value": 0.9173473
}
],
"value": 0.0
},
{
"description": "product of:",
"details": [
{
"description": "sum of:",
"details": [
{
"description": "weight(interests:games in 0) [PerFieldSimilarity], result of:",
"details": [
{
"description": "score(doc=0,freq=1.0), product of:",
"details": [
{
"description": "queryWeight, product of:",
"details": [
{
"description": "idf(docFreq=1, maxDocs=1)",
"details": [],
"value": 0.30685282
},
{
"description": "queryNorm",
"details": [],
"value": 0.9173473
}
],
"value": 0.2814906
},
{
"description": "fieldWeight in 0, product of:",
"details": [
{
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"description": "termFreq=1.0",
"details": [],
"value": 1.0
}
],
"value": 1.0
},
{
"description": "idf(docFreq=1, maxDocs=1)",
"details": [],
"value": 0.30685282
},
{
"description": "fieldNorm(doc=0)",
"details": [],
"value": 1.0
}
],
"value": 0.30685282
}
],
"value": 0.08637618
}
],
"value": 0.08637618
},
{
"description": "weight(interests:music in 0) [PerFieldSimilarity], result of:",
"details": [
{
"description": "score(doc=0,freq=1.0), product of:",
"details": [
{
"description": "queryWeight, product of:",
"details": [
{
"description": "idf(docFreq=1, maxDocs=1)",
"details": [],
"value": 0.30685282
},
{
"description": "queryNorm",
"details": [],
"value": 0.9173473
}
],
"value": 0.2814906
},
{
"description": "fieldWeight in 0, product of:",
"details": [
{
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"description": "termFreq=1.0",
"details": [],
"value": 1.0
}
],
"value": 1.0
},
{
"description": "idf(docFreq=1, maxDocs=1)",
"details": [],
"value": 0.30685282
},
{
"description": "fieldNorm(doc=0)",
"details": [],
"value": 1.0
}
],
"value": 0.30685282
}
],
"value": 0.08637618
}
],
"value": 0.08637618
}
],
"value": 0.17275237
},
{
"description": "coord(2/3)",
"details": [],
"value": 0.6666667
}
],
"value": 0.11516824
}
],
"value": 0.11516824
},
"_id": "3918",
"_index": "test_44024988_profiles",
"_node": "urWXg5KhREyffYielaa6Rw",
"_score": 0.11516824,
"_shard": 4,
"_source": {
"full_name": "Alex Test",
"id": 3917,
"interests": [
"games",
"music"
],
"user_id": 3918
},
"_type": "profile_document"
},
... # not interesting doc
],
"max_score": 0.14981213,
"total": 3
},
"timed_out": false,
"took": 3
}
我的输入数据:
[{
"full_name": "Bob Doe",
"id": 3916,
"interests": [
"games"
],
"user_id": 3917
}, {
"full_name": "Alex Test",
"id": 3917,
"interests": [
"games",
"music"
],
"user_id": 3918
}, {
"full_name": "Joe Test",
"id": 3918,
"user_id": 3919
}]
答案 0 :(得分:0)
让我们看一下Elasticsearch中的评分公式。
score(q,d) =
queryNorm(q)
· coord(q,d)
· ∑ (
tf(t in d)
· idf(t)²
· t.getBoost()
· norm(t,d)
) (t in q)
引用是practical scoring formula,如果你不了解它,你可以在这里得到一些描述。但是你的案例的解释将非常简单,它只是做这些事情的公式,以及所有这些因素的组合( tf , idf , queryNorm 等)。此外,如果您的索引是虚拟的并且只包含几个文档,这可能会发生,这些值非常奇怪。
我可以深入解释,但主要是它是一个得分公式。如果你想解决它,这是另一个问题,你可以通过做不同的查询来做到这一点