Question

简而言之，我正在尝试在方法CustomScoreProvider.CustomScore中确定文档的真实文档ID，该文档ID仅提供相对于子IndexReader 的文档“ID”。

更多信息：我试图通过预先计算的提升因子来提高我的文档分数（想象一下将Lucene的文档ID映射到提升因子的内存结构）。不幸的是，由于以下几个原因，我无法将提升存储在索引中：提升不会用于所有查询，加上提升因子可能会定期更改，这会触发大量的重建索引。

相反，我想在查询时提高分数，因此我一直在使用CustomScoreQuery / CustomScoreProvider。提升发生在方法CustomScoreProvider.CustomScore：
中
public override float CustomScore(int doc, float subQueryScore, float valSrcScore) { float baseScore = subQueryScore * valSrcScore; // the default computation // boost -- THIS IS WHERE THE PROBLEM IS float boostedScore = baseScore * MyBoostCache.GetBoostForDocId(doc); return boostedScore; }

我的问题是传递给CustomScore的doc参数。它不是真正的文档ID - 它相对于用于该索引段的子读取器。（MyBoostCache类是我的内存结构映射Lucene的文档以提升因子。）如果我知道读者的docBase，我可以找出真实的id（id = doc + docBase）。

关于我如何确定真实身份的任何想法，或者有更好的方法来完成我正在做的事情？

（我知道我想要获取的ID可能会发生变化，我已经采取措施确保MyBoostCache始终与最新的ID保持同步。）

Answer 1

我能够通过将IndexSearcher传递给我的CustomScoreProvider来实现这一点，使用它来确定CustomScoreProvider正在使用哪个子读取器，然后从IndexSearcher获取先前子读取器的MaxDoc以确定docBase。

private int DocBase { get; set; }

public MyScoreProvider(IndexReader reader, IndexSearcher searcher) {
   DocBase = GetDocBaseForIndexReader(reader, searcher);
}

private static int GetDocBaseForIndexReader(IndexReader reader, IndexSearcher searcher) {
    // get all segment readers for the searcher
    IndexReader rootReader = searcher.GetIndexReader();
    var subReaders = new List<IndexReader>();
    ReaderUtil.GatherSubReaders(subReaders, rootReader);

    // sequentially loop through the subreaders until we find the specified reader, adjusting our offset along the way
    int docBase = 0;
    for (int i = 0; i < subReaders.Count; i++)
    {
        if (subReaders[i] == reader)
            break;
        docBase += subReaders[i].MaxDoc();
    }

    return docBase;
}

public override float CustomScore(int doc, float subQueryScore, float valSrcScore) {
   float baseScore = subQueryScore * valSrcScore;
   float boostedScore = baseScore * MyBoostCache.GetBoostForDocId(doc + DocBase);
   return boostedScore;
}

如何在CustomScoreProvider中获取文档ID？

1 个答案: