如何对Lucene.Net搜索的结果进行分组?

时间:2013-03-02 08:45:08

标签: lucene.net

我设法创建了文档并进行了一些复杂的搜索,但在分组搜索结果时面临问题。

搜索后显示的书籍很好。除了需要完成此Author分组计数外,还需要基于相同的搜索查询。

实施例,

Author Name      | Count
A                | 12
B                | 2

我使用的Lucene.Net 3.0.3.0不支持分组,但可能会有一些解决方法。我也需要价格范围相同的功能。

1 个答案:

答案 0 :(得分:2)

如果您编写自定义Collector,一切皆有可能。您描述的是方面,可以通过自己计算文档值来轻松解决。核心部分是调用IndexSearcher.Search重载接受收集器。收集器应该读取值,通常使用字段缓存实现来实现,并进行所需的计算。

这是一个简短的演示,使用我的演示项目Corelicious.Lucene中的一些类。

var postTypes = new Dictionary<Int32, Int32>();
searcher.Search(query, new DelegatingCollector((reader, doc, scorer) => {
    var score = scorer.Score();
    if (score > 0) {
        var postType = SingleFieldCache.Default.GetInt32(reader, "PostTypeId", doc);
        if (postType.HasValue) {
            if (postTypes.ContainsKey(postType.Value)) {
                postTypes[postType.Value]++;
            } else {
                postTypes[postType.Value] = 1;
            }
        }
    }
}));

完整代码:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
using System.Xml;
using Corelicious.Lucene;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Directory = Lucene.Net.Store.Directory;
using Version = Lucene.Net.Util.Version;

namespace ConsoleApplication {
    public static class Program {
        public static void Main(string[] args) {
            Console.WriteLine ("Creating directory...");
            var directory = new RAMDirectory();
            var analyzer = new StandardAnalyzer(Version.LUCENE_30);
            CreateIndex(directory, analyzer);

            var userQuery = "calculate pi";
            var queryParser = new QueryParser(Version.LUCENE_30, "Body", analyzer);
            var query = queryParser.Parse(userQuery);
            Console.WriteLine("Query: '{0}'", query);

            var indexReader = IndexReader.Open(directory, readOnly: true);
            var searcher = new IndexSearcher(indexReader);

            var postTypes = new Dictionary<Int32, Int32>();
            searcher.Search(query, new DelegatingCollector((reader, doc, scorer) => {
                var score = scorer.Score();
                if (score > 0) {
                    var postType = SingleFieldCache.Default.GetInt32(reader, "PostTypeId", doc);
                    if (postType.HasValue) {
                        if (postTypes.ContainsKey(postType.Value)) {
                            postTypes[postType.Value]++;
                        } else {
                            postTypes[postType.Value] = 1;
                        }
                    }
                }
            }));

            Console.WriteLine("Post type summary");
            Console.WriteLine("Post type  | Count");

            foreach(var pair in postTypes.OrderByDescending(x => x.Value)) {
                var postType = (PostType)pair.Key;
                Console.WriteLine("{0,-10} | {1}", postType, pair.Value);
            }

            Console.ReadLine ();
        }

        public enum PostType {
            Question = 1,
            Answer = 2,
            Tag = 4
        }

        public static void CreateIndex(Directory directory, Analyzer analyzer) {
            using (var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED))
            using (var xmlStream = File.OpenRead("/Users/sisve/Downloads/Stack Exchange Data Dump - Sept 2011/Content/092011 Mathematics/posts.xml"))
            using (var xmlReader = XmlReader.Create(xmlStream)) {
                while (xmlReader.ReadToFollowing("row")) {
                    var tags = xmlReader.GetAttribute("Tags") ?? String.Empty;
                    var title = xmlReader.GetAttribute("Title") ?? String.Empty;
                    var body = xmlReader.GetAttribute("Body");

                    var doc = new Document();

                    // tags are stored as <tag1><tag2>
                    foreach (Match match in Regex.Matches(tags, "<(.*?)>")) {
                        doc.Add(new Field("Tags", match.Groups[1].Value, Field.Store.NO, Field.Index.NOT_ANALYZED));
                    }

                    doc.Add(new Field("Title", title, Field.Store.NO, Field.Index.ANALYZED));
                    doc.Add(new Field("Body", body, Field.Store.NO, Field.Index.ANALYZED));
                    doc.Add(new Field("PostTypeId", xmlReader.GetAttribute("PostTypeId"), Field.Store.NO, Field.Index.NOT_ANALYZED));

                    writer.AddDocument(doc);
                }

                writer.Optimize();
                writer.Commit();
            }
        }
    }
}