Lucene.Net:如何在搜索结果中添加日期过滤器?

时间:2010-12-30 18:44:51

标签: .net language-agnostic lucene.net

我的搜索者工作得非常好,但它确实会返回过时的结果。我的网站很像NerdDinner,过去的事件变得无关紧要。

我目前正在像这样索引 注意:我的例子是在VB.NET中,但我不在乎是否在C#中给出了示例

    Public Function AddIndex(ByVal searchableEvent As [Event]) As Boolean Implements ILuceneService.AddIndex

        Dim writer As New IndexWriter(luceneDirectory, New StandardAnalyzer(), False)

        Dim doc As Document = New Document

        doc.Add(New Field("id", searchableEvent.ID, Field.Store.YES, Field.Index.UN_TOKENIZED))
        doc.Add(New Field("fullText", FullTextBuilder(searchableEvent), Field.Store.YES, Field.Index.TOKENIZED))
        doc.Add(New Field("user", If(searchableEvent.User.UserName = Nothing,
                                     "User" & searchableEvent.User.ID,
                                     searchableEvent.User.UserName),
                                 Field.Store.YES,
                                 Field.Index.TOKENIZED))
        doc.Add(New Field("title", searchableEvent.Title, Field.Store.YES, Field.Index.TOKENIZED))
        doc.Add(New Field("location", searchableEvent.Location.Name, Field.Store.YES, Field.Index.TOKENIZED))
        doc.Add(New Field("date", searchableEvent.EventDate, Field.Store.YES, Field.Index.UN_TOKENIZED))

        writer.AddDocument(doc)

        writer.Optimize()
        writer.Close()
        Return True

    End Function

注意我有一个存储事件日期的“日期”索引。

我的搜索看起来像这样

''# code omitted
        Dim reader As IndexReader = IndexReader.Open(luceneDirectory)
        Dim searcher As IndexSearcher = New IndexSearcher(reader)
        Dim parser As QueryParser = New QueryParser("fullText", New StandardAnalyzer())
        Dim query As Query = parser.Parse(q.ToLower)

        ''# We're using 10,000 as the maximum number of results to return
        ''# because I have a feeling that we'll never reach that full amount
        ''# anyways.  And if we do, who in their right mind is going to page
        ''# through all of the results?
        Dim topDocs As TopDocs = searcher.Search(query, Nothing, 10000)
        Dim doc As Document = Nothing

        ''# loop through the topDocs and grab the appropriate 10 results based
        ''# on the submitted page number
        While i <= last AndAlso i < topDocs.totalHits
                doc = searcher.Doc(topDocs.scoreDocs(i).doc)
                IDList.Add(doc.[Get]("id"))
                i += 1
        End While
''# code omitted

我确实尝试了以下内容,但它无济于事(抛出NullReferenceException)。

        While i <= last AndAlso i < topDocs.totalHits
            If Date.Parse(doc.[Get]("date")) >= Date.Today Then
                doc = searcher.Doc(topDocs.scoreDocs(i).doc)
                IDList.Add(doc.[Get]("id"))
                i += 1
            End If
        End While

我还找到了以下文档,但我无法做出正面或反面 http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/search/DateFilter.html

2 个答案:

答案 0 :(得分:10)

您正在链接到Lucene 1.4.3的api文档。 Lucene.Net目前为2.9.2。我认为应该进行升级。

首先,你正在使用Store.Yes很多。存储的字段会使您的索引更大,这可能是性能问题。通过将日期作为字符串以“yyyyMMddHHmmssfff”(即真正的高分辨率,低至毫秒)存储,可以轻松解决日期问题。您可能希望降低分辨率以创建更少的令牌以减少索引大小。

var dateValue = DateTools.DateToString(searchableEvent.EventDate, DateTools.Resolution.MILLISECOND);
doc.Add(new Field("date", dateValue, Field.Store.YES, Field.Index.NOT_ANALYZED));

然后对搜索应用过滤器(第二个参数,当前传入Nothing / null)。

var dateValue = DateTools.DateToString(DateTime.Now, DateTools.Resolution.MILLISECOND);
var filter = FieldCacheRangeFilter.NewStringRange("date", 
                 lowerVal: dateValue, includeLower: true, 
                 upperVal: null, includeUpper: false);
var topDocs = searcher.Search(query, filter, 10000);

您可以使用将常规查询与RangeQuery相结合的BooleanQuery来执行此操作,但这也会影响评分(在查询而非过滤器上计算)。您可能还希望避免修改查询以简化,因此您知道执行了什么查询。

答案 1 :(得分:7)

您可以将多个查询与BooleanQuery结合使用。由于Lucene仅搜索文本注释,因此索引中的日期字段必须按日期的最重要部分排序,即采用IS8601格式(“2010-11-02T20:49:16.000000 + 00:00”)

示例:

Lucene.Net.Index.Term searchTerm = new Lucene.Net.Index.Term("fullText", searchTerms);
Lucene.Net.Index.Term dateRange = new Lucene.Net.Index.Term("date", "2010*");

Lucene.Net.Search.Query termQuery = new Lucene.Net.Search.TermQuery(searchTerm);
Lucene.Net.Search.Query dateRangeQuery = new Lucene.Net.Search.WildcardQuery(dateRange);

Lucene.Net.Search.BooleanQuery query = new Lucene.Net.Search.BooleanQuery();
query.Add(termQuery, BooleanClause.Occur.MUST);
query.Add(dateRangeQuery, BooleanClause.Occur.MUST);

或者,如果通配符不够精确,您可以添加RangeQuery代替:

Lucene.Net.Search.Query termQuery = new Lucene.Net.Search.TermQuery(searchTerm);
Lucene.Net.Index.Term date1 = new Lucene.Net.Index.Term("date", "2010-11-02*");
Lucene.Net.Index.Term date2 = new Lucene.Net.Index.Term("date", "2010-11-03*");
Lucene.Net.Search.Query dateRangeQuery = new Lucene.Net.Search.RangeQuery(date1, date2, true);

Lucene.Net.Search.BooleanQuery query = new Lucene.Net.Search.BooleanQuery();
query.Add(termQuery, BooleanClause.Occur.MUST);
query.Add(dateRangeQuery, BooleanClause.Occur.MUST);