我正在使用Lucene创建一个搜索引擎,它一切顺利,但我必须实现和算法根据其相关性和年龄对结果进行评分。我有三个输入:
我目前正在做的基本上是:
ageOfDocumentInHours = age / 3600; //this is to avoid any overflows
ageModifier = ageOfDocumentInHours * ageScew + 1; // scew of 0 results in relevancy * 1
overallScore = relevancy * ageModifier;
我对统计数据一无所知 - 有更好的方法吗?
谢谢,
乔
答案 0 :(得分:0)
这就是我最终做的事情:
public override float CustomScore(int doc, float subQueryScore, float valSrcScore)
{
float contentScore = subQueryScore;
double start = 1262307661d; //2010
if (_dateVsContentModifier == 0)
{
return base.CustomScore(doc, subQueryScore, valSrcScore);
}
long epoch = (long)(DateTime.Now - new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc)).TotalSeconds;
long docSinceStartHours = (long)Math.Ceiling((valSrcScore - start) / 3600);
long nowSinceStartHours = (long)Math.Ceiling((epoch - start) / 3600);
float ratio = (float)docSinceStartHours / (float)nowSinceStartHours; // Get a fraction where a document that was created this hour has a value of 1
float ageScore = (ratio * _dateVsContentModifier) + 1; // We add 1 because we dont want the bit where we square it bellow to make the value smaller
float ageScoreAdjustedSoNewerIsBetter = 1;
if (_newerContentModifier > 0)
{
// Here we square it, multiuply it and then get the square root. This serves to make newer content have an exponentially higher score than old content instead of it just being linear
ageScoreAdjustedSoNewerIsBetter = (float)Math.Sqrt((ageScore * ageScore) * _newerContentModifier);
}
return ageScoreAdjustedSoNewerIsBetter * contentScore;
}
基本思想是年龄分数是一个分数,其中0是2010年的第一天,1是现在。然后将此十进制值乘以_dateVsContentModifier,可选择使日期相对于相关性得分提升。
年龄scroe是平方,乘以_newerContentModifier然后平方根。这会导致较新的内容得分高于旧内容。
乔