Lucene.net - 如何从每场比赛中提取小段文本?

时间:2013-08-10 17:12:02

标签: c# lucene.net

http://quranx.com/Search?q=oh+people+of+heaven&context=Quran

有人能告诉我如何更改以下代码以显示每个匹配结果的文本片段吗?我已经尝试过阅读示例等,但只能找到较新版本的Lucene for Java的相关信息。 Lucene对我来说似乎是一个黑盒子。

public static IEnumerable<SearchResult> Search(
    string queryString, 
    out int totalResults,
    int maxResults = 100)
{
    totalResults = 0;
    if (string.IsNullOrEmpty(queryString))
        return new List<SearchResult>();

    var query = new MultiFieldQueryParser(
        Lucene.Net.Util.Version.LUCENE_30,
        new string[] { "Body", "SecondaryReferences" },
        Analyzer
    ).Parse(queryString);

    var indexReader = DirectoryReader.Open(
        directory: Index,
        readOnly: true);
    var indexSearcher = new IndexSearcher(indexReader);
    var resultsCollector = TopScoreDocCollector.Create(
        numHits: maxResults,
        docsScoredInOrder: true
    );
    indexSearcher.Search(
        query: query,
        results: resultsCollector
    );
    totalResults = resultsCollector.TotalHits;
    var result = new List<SearchResult>();
    foreach (var scoreDoc in resultsCollector.TopDocs().ScoreDocs)
    {
        var snippets = new List<SearchResultSnippet>();
        var doc = indexSearcher.Doc(scoreDoc.Doc);
        var searchResult = new SearchResult(
            type: doc.Get("Type"),
            id: doc.Get("ID"),
            snippets: snippets
        );
        result.Add(searchResult);
    }
    return result;
}

1 个答案:

答案 0 :(得分:2)

为了能够访问匹配附近的文本,您需要在索引时存储具有位置和偏移信息的TermVectors,然后您可以使用它来检索周围的单词。

有关详细说明,请参阅http://searchhub.org/2009/05/26/accessing-words-around-a-positional-match-in-lucene/