Lucene.net返回正确的查询命中数,但不返回正确的文档

时间:2010-08-17 18:52:20

标签: lucene lucene.net

我是Lucene的新手并试图解决这个问题。我的索引是这样的:

        Directory dir = FSDirectory.Open(new System.IO.DirectoryInfo(dirIndexDir));

        //Create the indexWriter
        IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), true,
            IndexWriter.MaxFieldLength.UNLIMITED);


            Document doc = new Document();

            doc.Add(new Field("keyform_type", entry.keyForm.type, Field.Store.YES, Field.Index.NOT_ANALYZED));
            doc.Add(new Field("keyform_lang", entry.keyForm.lang, Field.Store.YES, Field.Index.NOT_ANALYZED));

                doc.Add(new Field("keyform_dial", entry.keyForm.dial, Field.Store.YES, Field.Index.NOT_ANALYZED));

            doc.Add(new Field("keyform_reg", entry.keyForm.reg, Field.Store.YES, Field.Index.NOT_ANALYZED));
            doc.Add(new Field("keyform_term", entry.keyForm.term.Value, Field.Store.YES, Field.Index.ANALYZED));

                if(entry.refForm.type!=null)
                    doc.Add(new Field("refform_type", entry.refForm.type, Field.Store.YES, Field.Index.NOT_ANALYZED));
                if(entry.refForm.lang!=null)
                    doc.Add(new Field("refform_lang", entry.refForm.lang, Field.Store.YES, Field.Index.NOT_ANALYZED));
                if (entry.refForm.dial != null)
                    doc.Add(new Field("refform_dial", entry.refForm.dial, Field.Store.YES, Field.Index.NOT_ANALYZED));

                if(entry.refForm.reg!=null)
                    doc.Add(new Field("refform_reg", entry.refForm.reg, Field.Store.YES, Field.Index.NOT_ANALYZED));
                if(entry.refForm.term.Value!=null)
                    doc.Add(new Field("refform_term", entry.refForm.term.Value, Field.Store.YES, Field.Index.ANALYZED));

                doc.Add(new Field("pos", entry.pos, Field.Store.YES, Field.Index.NOT_ANALYZED));

                for (int s = 0; s < entry.subject.Count; s++)
                {
                    doc.Add(new Field("subject_"+s, entry.subject[s], Field.Store.YES, Field.Index.NOT_ANALYZED));
                }
                for (int g = 0; g < entry.sense.gloss.Count; g++)
                {
                    doc.Add(new Field("gloss_"+g, entry.sense.gloss[g], Field.Store.YES, Field.Index.ANALYZED));

                }
                if (entry.signature.action != null)
                    doc.Add(new Field("action", entry.signature.action, Field.Store.YES, Field.Index.NOT_ANALYZED));
                if (entry.signature.source != null)
                    doc.Add(new Field("source", entry.signature.source, Field.Store.YES, Field.Index.NOT_ANALYZED));
                if(entry.signature.date==0)
                    doc.Add(new Field("date", entry.signature.date.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
            //Add the doc
            writer.AddDocument(doc);

        writer.Close();

然后我使用此代码进行查询:

        //Doesn't matter what term is, same result
        string term="workers";

        Directory dir = FSDirectory.Open(new System.IO.DirectoryInfo(luceneDir));

        IndexSearcher searcher = new IndexSearcher(dir, true);
        List<string> b=new List<string>();
        b.Add("keyform_gloss");
        b.Add("keyform_term");
        b.Add("refform_term");
        b.Add("refform_gloss");
        for (int i = 0; i < nMaxDupes; i++)
            b.Add("gloss_" + i.ToString());
        MultiFieldQueryParser mfqp = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29,
            b.ToArray(), new StandardAnalyzer());
        Query q = mfqp.Parse(term);
        TopDocs td = searcher.Search(q, 300);

        for (int i = 0; i < td.totalHits; i++)
        {
            //Generate a dictionaryEntry for each hit
            Document doc = searcher.Doc(i);

            //Access the document fields, blah
        }

无论术语的值是什么,Lucene都会返回索引中的前X个文档,其中X =实际匹配术语的文档数。当我使用LUKE浏览索引时,相同的手工类型查询(keyform_term:term gloss_0:term etc)会返回正确的结果数以及与这些结果匹配的正确文档。

但是,上面的C#代码始终返回前X个文档,这些文档不一定包含任何搜索字段中的搜索词。他们甚至都没有接近。

我做错了什么?我知道索引很好,因为我可以在LUKE中搜索它,所以它必须是查询中的东西......

谢谢!

1 个答案:

答案 0 :(得分:6)

该行:

Document doc = searcher.Doc(i);

应该是

Document doc = searcher.Doc(td.scoreDocs[i].doc);

或正确的C#语法等价物(我是一个Java人,对不起)