Elasticseach - 匹配更多术语的文档得分低于匹配较少术语的得分

时间:2017-02-10 11:25:30

标签: elasticsearch

我有一个应该返回具有相似兴趣的个人资料的查询。问题是,更多匹配术语的文档得分更低。

bool查询中,我shouldinterests = ['games', 'music', 'sport']

interests = ['games']的文件得分为0.14981213

interests = ['games', 'music']的文档得分为0.11516824。

为什么呢?我正在使用AWS elasticsearch,v.2.3.2。

查询如下:

{
    "explain": true,
    "from": 0,
    "query": {
        "bool": {
            "filter": [
                {
                    "bool": {
                        "must_not": [
                            {
                                "term": {
                                    "id": 3918
                                }
                            }
                        ]
                    }
                }
            ],
            "should": [
                {
                    "terms": {
                        "interests": [
                            "games",
                            "music",
                            "sport"
                        ]
                    }
                }
            ]
        }
    },
    "size": 10
}

然后,结果我得到了:

{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_explanation": {
                    "description": "sum of:",
                    "details": [
                        {
                            "description": "match on required clause, product of:",
                            "details": [
                                {
                                    "description": "# clause",
                                    "details": [],
                                    "value": 0.0
                                },
                                {
                                    "description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:",
                                    "details": [
                                        {
                                            "description": "boost",
                                            "details": [],
                                            "value": 1.0
                                        },
                                        {
                                            "description": "queryNorm",
                                            "details": [],
                                            "value": 0.4494364
                                        }
                                    ],
                                    "value": 0.4494364
                                }
                            ],
                            "value": 0.0
                        },
                        {
                            "description": "product of:",
                            "details": [
                                {
                                    "description": "sum of:",
                                    "details": [
                                        {
                                            "description": "weight(interests:games in 1) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "description": "score(doc=1,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "idf(docFreq=2, maxDocs=3)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "queryNorm",
                                                                    "details": [],
                                                                    "value": 0.4494364
                                                                }
                                                            ],
                                                            "value": 0.4494364
                                                        },
                                                        {
                                                            "description": "fieldWeight in 1, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "description": "termFreq=1.0",
                                                                            "details": [],
                                                                            "value": 1.0
                                                                        }
                                                                    ],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "idf(docFreq=2, maxDocs=3)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "fieldNorm(doc=1)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                }
                                                            ],
                                                            "value": 1.0
                                                        }
                                                    ],
                                                    "value": 0.4494364
                                                }
                                            ],
                                            "value": 0.4494364
                                        }
                                    ],
                                    "value": 0.4494364
                                },
                                {
                                    "description": "coord(1/3)",
                                    "details": [],
                                    "value": 0.33333334
                                }
                            ],
                            "value": 0.14981213
                        }
                    ],
                    "value": 0.14981213
                },
                "_id": "3917",
                "_index": "test_44024988_profiles",
                "_node": "urWXg5KhREyffYielaa6Rw",
                "_score": 0.14981213,
                "_shard": 2,
                "_source": {
                    "full_name": "Bob Doe",
                    "id": 3916,
                    "interests": [
                        "games"
                    ],
                    "user_id": 3917
                },
                "_type": "profile_document"
            },
            {
                "_explanation": {
                    "description": "sum of:",
                    "details": [
                        {
                            "description": "match on required clause, product of:",
                            "details": [
                                {
                                    "description": "# clause",
                                    "details": [],
                                    "value": 0.0
                                },
                                {
                                    "description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:",
                                    "details": [
                                        {
                                            "description": "boost",
                                            "details": [],
                                            "value": 1.0
                                        },
                                        {
                                            "description": "queryNorm",
                                            "details": [],
                                            "value": 0.9173473
                                        }
                                    ],
                                    "value": 0.9173473
                                }
                            ],
                            "value": 0.0
                        },
                        {
                            "description": "product of:",
                            "details": [
                                {
                                    "description": "sum of:",
                                    "details": [
                                        {
                                            "description": "weight(interests:games in 0) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "description": "score(doc=0,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "queryNorm",
                                                                    "details": [],
                                                                    "value": 0.9173473
                                                                }
                                                            ],
                                                            "value": 0.2814906
                                                        },
                                                        {
                                                            "description": "fieldWeight in 0, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "description": "termFreq=1.0",
                                                                            "details": [],
                                                                            "value": 1.0
                                                                        }
                                                                    ],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "fieldNorm(doc=0)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                }
                                                            ],
                                                            "value": 0.30685282
                                                        }
                                                    ],
                                                    "value": 0.08637618
                                                }
                                            ],
                                            "value": 0.08637618
                                        },
                                        {
                                            "description": "weight(interests:music in 0) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "description": "score(doc=0,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "queryNorm",
                                                                    "details": [],
                                                                    "value": 0.9173473
                                                                }
                                                            ],
                                                            "value": 0.2814906
                                                        },
                                                        {
                                                            "description": "fieldWeight in 0, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "description": "termFreq=1.0",
                                                                            "details": [],
                                                                            "value": 1.0
                                                                        }
                                                                    ],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "fieldNorm(doc=0)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                }
                                                            ],
                                                            "value": 0.30685282
                                                        }
                                                    ],
                                                    "value": 0.08637618
                                                }
                                            ],
                                            "value": 0.08637618
                                        }
                                    ],
                                    "value": 0.17275237
                                },
                                {
                                    "description": "coord(2/3)",
                                    "details": [],
                                    "value": 0.6666667
                                }
                            ],
                            "value": 0.11516824
                        }
                    ],
                    "value": 0.11516824
                },
                "_id": "3918",
                "_index": "test_44024988_profiles",
                "_node": "urWXg5KhREyffYielaa6Rw",
                "_score": 0.11516824,
                "_shard": 4,
                "_source": {
                    "full_name": "Alex Test",
                    "id": 3917,
                    "interests": [
                        "games",
                        "music"
                    ],
                    "user_id": 3918
                },
                "_type": "profile_document"
            },
            ... # not interesting doc
        ],
        "max_score": 0.14981213,
        "total": 3
    },
    "timed_out": false,
    "took": 3
}

我的输入数据:

[{
    "full_name": "Bob Doe",
    "id": 3916,
    "interests": [
        "games"
    ],
    "user_id": 3917
}, {
    "full_name": "Alex Test",
    "id": 3917,
    "interests": [
        "games",
        "music"
    ],
    "user_id": 3918
}, {
    "full_name": "Joe Test",
    "id": 3918,
    "user_id": 3919
}]

1 个答案:

答案 0 :(得分:0)

让我们看一下Elasticsearch中的评分公式。

score(q,d)  =  
            queryNorm(q)  
          · coord(q,d)    
          · ∑ (           
                tf(t in d)   
              · idf(t)²      
              · t.getBoost() 
              · norm(t,d)    
            ) (t in q)    

引用是practical scoring formula,如果你不了解它,你可以在这里得到一些描述。但是你的案例的解释将非常简单,它只是做这些事情的公式,以及所有这些因素的组合( tf idf queryNorm 等)。此外,如果您的索引是虚拟的并且只包含几个文档,这可能会发生,这些值非常奇怪。

我可以深入解释,但主要是它是一个得分公式。如果你想解决它,这是另一个问题,你可以通过做不同的查询来做到这一点