我有一个Lucene索引,其中每个文档都有几个包含数值的字段。现在我想在这个字段的加权和上对搜索结果进行排序。 例如:
field1=100
field2=002
field3=014
加权函数如下:
f(d) = field1 * 0.5 + field2 * 1.4 + field3 * 1.8
结果应按f(d)
排序,其中d
代表文档。排序功能应该是非静态的,并且可能因搜索到搜索而不同,因为常量因素会受到执行搜索的用户的影响。
有谁知道如何解决这个问题,或者想知道如何以另一种方式实现这一目标?
答案 0 :(得分:13)
您可以尝试实施自定义ScoreDocComparator。例如:
public class ScaledScoreDocComparator implements ScoreDocComparator {
private int[][] values;
private float[] scalars;
public ScaledScoreDocComparator(IndexReader reader, String[] fields, float[] scalars) throws IOException {
this.scalars = scalars;
this.values = new int[fields.length][];
for (int i = 0; i < values.length; i++) {
this.values[i] = FieldCache.DEFAULT.getInts(reader, fields[i]);
}
}
protected float score(ScoreDoc scoreDoc) {
int doc = scoreDoc.doc;
float score = 0;
for (int i = 0; i < values.length; i++) {
int value = values[i][doc];
float scalar = scalars[i];
score += (value * scalar);
}
return score;
}
@Override
public int compare(ScoreDoc i, ScoreDoc j) {
float iScore = score(i);
float jScore = score(j);
return Float.compare(iScore, jScore);
}
@Override
public int sortType() {
return SortField.CUSTOM;
}
@Override
public Comparable<?> sortValue(ScoreDoc i) {
float score = score(i);
return Float.valueOf(score);
}
}
以下是ScaledScoreDocComparator
的实例示例。我相信它适用于我的测试,但我鼓励您根据您的数据证明它。
final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };
Sort sort = new Sort(
new SortField(
"",
new SortComparatorSource() {
public ScoreDocComparator newComparator(IndexReader reader, String fieldName) throws IOException {
return new ScaledScoreDocComparator(reader, fields, scalars);
}
}
)
);
IndexSearcher indexSearcher = ...;
Query query = ...;
Filter filter = ...; // can be null
int nDocs = 100;
TopFieldDocs topFieldDocs = indexSearcher.search(query, filter, nDocs, sort);
ScoreDoc[] scoreDocs = topFieldDocs.scoreDocs;
似乎Lucene开发人员正在弃用ScoreDocComparator
接口(它目前在Subversion存储库中已弃用)。以下是ScaledScoreDocComparator
经过修改以遵守ScoreDocComparator
的继任者FieldComparator
的示例:
public class ScaledComparator extends FieldComparator {
private String[] fields;
private float[] scalars;
private int[][] slotValues;
private int[][] currentReaderValues;
private int bottomSlot;
public ScaledComparator(int numHits, String[] fields, float[] scalars) {
this.fields = fields;
this.scalars = scalars;
this.slotValues = new int[this.fields.length][];
for (int fieldIndex = 0; fieldIndex < this.fields.length; fieldIndex++) {
this.slotValues[fieldIndex] = new int[numHits];
}
this.currentReaderValues = new int[this.fields.length][];
}
protected float score(int[][] values, int secondaryIndex) {
float score = 0;
for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
int value = values[fieldIndex][secondaryIndex];
float scalar = scalars[fieldIndex];
score += (value * scalar);
}
return score;
}
protected float scoreSlot(int slot) {
return score(slotValues, slot);
}
protected float scoreDoc(int doc) {
return score(currentReaderValues, doc);
}
@Override
public int compare(int slot1, int slot2) {
float score1 = scoreSlot(slot1);
float score2 = scoreSlot(slot2);
return Float.compare(score1, score2);
}
@Override
public int compareBottom(int doc) throws IOException {
float bottomScore = scoreSlot(bottomSlot);
float docScore = scoreDoc(doc);
return Float.compare(bottomScore, docScore);
}
@Override
public void copy(int slot, int doc) throws IOException {
for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
slotValues[fieldIndex][slot] = currentReaderValues[fieldIndex][doc];
}
}
@Override
public void setBottom(int slot) {
bottomSlot = slot;
}
@Override
public void setNextReader(IndexReader reader, int docBase, int numSlotsFull) throws IOException {
for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
String field = fields[fieldIndex];
currentReaderValues[fieldIndex] = FieldCache.DEFAULT.getInts(reader, field);
}
}
@Override
public int sortType() {
return SortField.CUSTOM;
}
@Override
public Comparable<?> value(int slot) {
float score = scoreSlot(slot);
return Float.valueOf(score);
}
}
使用这个新类与原始类非常相似,只是sort
对象的定义有点不同:
final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };
Sort sort = new Sort(
new SortField(
"",
new FieldComparatorSource() {
public FieldComparator newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
return new ScaledComparator(numHits, fields, scalars);
}
}
)
);
答案 1 :(得分:0)
我认为这样做的一种方法是接受这些作为排序功能的参数:
字段数,文档数组,权重因子列表(基于字段数)
计算每个文档的称重函数,将结果以与文档数组相同的顺序存储在单独的数组中。然后,执行您希望的任何排序(快速排序可能是最好的),确保您不仅排序f(d)数组,还排序文档数组。返回已排序的文档数组,您就完成了。
答案 2 :(得分:0)
实现您自己的相似性类并覆盖idf(Term, Searcher)方法。 在此方法中,您可以按如下方式返回分数。 if(term.field.equals(“field1”){
if (term.field.equals("field1") {
score = 0.5 * Integer.parseInt(term.text());
} else if (term.field.equals("field2") {
score = 1.4 * Integer.parseInt(term.text());
} // and so on
return score;
执行查询时,请确保它在所有字段上。这是查询应该看起来像
field1:term field2:term field3:term
最终得分还将根据查询规范化添加一些权重。但是,根据您给出的等式,这不会影响文档的相对排名。
答案 3 :(得分:0)
创建一个包含评级并且具有可比性的包装器。类似的东西:
public void sort(Datum[] data) {
Rating[] ratings = new Rating[data.length];
for(int i=0;i<data.length;i++)
rating[i] = new Rating(data[i]);
Arrays.sort(rating);
for(int i=0;i<data.length;i++)
data[i] = rating[i].datum;
}
class Rating implements Comparable<Datum> {
final double rating;
final Datum datum;
public Rating(Datum datum) {
this.datum = datum;
rating = datum.field1 * 0.5 + datum.field2 * 1.4 + datum.field3 * 1.8
}
public int compareTo(Datum d) {
return Double.compare(rating, d.rating);
}
}