我有两份文件:
{
id: 7,
title: 'Wet',
description: 'asdfasdfasdf'
}
{
id: 6
title: 'Wet wet',
description: 'asdfasdfasdf'
}
除了第二份文件中的额外词语外,它们几乎相同。
我的疑问是:
var qobject = {
query:{
custom_score:{
query:{
multi_match:{
query: q, //I searched for "wet"
fields: ['title','description'],
}
},
script: '_score '
}
}
}
好的,所以当我运行这个查询时,我得到了这些结果:
{ total: 2,
max_score: 1.8472979,
hits:
[ { _index: 'products',
_type: 'products',
_id: '7',
_score: 1.9808292,
_source: [Object] },
{ _index: 'products',
_type: 'products',
_id: '6',
_score: 1.7508222,
_source: [Object] } ] }
为什么id 7的排名高于id 6?得分背后的原因是什么?不应该排名更高,因为它有两个字?
如果我想要更多单词=更多权重怎么办?如何对我的查询进行修改?
解释如下:
"_explanation": {
"value": 1.9808292,
"description": "custom score, product of:",
"details": [
{
"value": 1.9808292,
"description": "script score function: composed of:",
"details": [
{
"value": 1.9808292,
"description": "fieldWeight(title:wet in 0), product of:",
"details": [
{
"value": 1,
"description": "tf(termFreq(title:wet)=1)"
},
{
"value": 1.9808292,
"description": "idf(docFreq=2, maxDocs=8)"
},
{
"value": 1,
"description": "fieldNorm(field=title, doc=0)"
}
]
}
]
},
{
"value": 1,
"description": "queryBoost"
}
]
}
"_explanation": {
"value": 1.7508222,
"description": "custom score, product of:",
"details": [
{
"value": 1.7508222,
"description": "script score function: composed of:",
"details": [
{
"value": 1.7508222,
"description": "fieldWeight(title:wet in 0), product of:",
"details": [
{
"value": 1.4142135,
"description": "tf(termFreq(title:wet)=2)"
},
{
"value": 1.9808292,
"description": "idf(docFreq=2, maxDocs=8)"
},
{
"value": 0.625,
"description": "fieldNorm(field=title, doc=0)"
}
]
}
]
},
{
"value": 1,
"description": "queryBoost"
}
]
}
答案 0 :(得分:7)
查看查询的说明输出以了解原因。您可以使用explain api或将"explain": true
添加到当前搜索请求中。
默认情况下,lucene使用tf / idf(术语频率,反向文档频率)相似性来评分文档。对于与查询匹配的每个术语,会考虑不同的因素。以下是最重要的:
根据您正在执行的查询,由于规范,文档的得分也不同。您可以在映射(和重新索引)中禁用规范,但这样您也将失去索引时间提升(我认为您无论如何都不会使用它)。实际上在你的例子中,第二个文档得分较低,因为字段规范较低,尽管有较高的术语频率(2而不是1)。
Antoher解决方案将插入不同的lucene相似性:lucene 4提供更多的相似性,并且还允许定义每个场的相似性。这些功能已在elasticsearch 0.90中公开。