如何在lucene中存储索引中的增强因子

时间:2011-09-05 05:47:12

标签: c# lucene.net

我正在使用lucene在地址簿中搜索产品。我想根据一些特定标准提高搜索结果。 (例如,匹配位置字段应该具有比实体名称中的匹配更大的相关性。)这是我的案例的固定标准。

我试图通过在索引时调用SetBoost()方法来存储具有Field的boostfactor。但随后结果的得分也不如预期。它考虑了每个领域相同的提升值。

有人可以告诉我哪里出错了吗?

我用来构建索引的代码。

Directory objIndexDirectory =
  FSDirectory.Open(new System.IO.DirectoryInfo(<PathOfIndexFolder>));
StandardAnalyzer objAnalyzer =
  new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
IndexWriter objWriter = new IndexWriter(
  objIndexDirectory, objAnalyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
Document objDocument = new Document();
Field objName =
  new Field("Name", "John Doe", Field.Store.YES, Field.Index.ANALYZED);
Field objLocation =
  new Field("Location", "NY", Field.Store.YES, Field.Index.NOT_ANALYZED);
objLocation.SetBoost((2f);
objDocument.Add(objName);
objDocument.Add(objLocation);
objWriter.AddDocument(objDocument);

我想要实现的是, 假设索引中有三个条目:

  1. John Doe,纽约
  2. John Foo,New Jercy
  3. XYZ,NY
  4. 在这种情况下,如果搜索查询是&#34; John NY&#34;,那么结果应具有相关性

    1. John Doe,纽约
    2. XYZ,NY
    3. John Foo,New Jercy

2 个答案:

答案 0 :(得分:2)

我无法弄清楚你认为你的方法有什么问题,但这是我用来测试的代码:

class Program
{
    static void Main(string[] args)
    {
        RAMDirectory dir = new RAMDirectory();

        IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer());

        AddDocument(writer, "John Doe", "NY");
        AddDocument(writer, "John Foo", "New Jersey");
        AddDocument(writer, "XYZ", "NY");

        writer.Commit();

        BooleanQuery query = new BooleanQuery();
        query.Add(new TermQuery(new Term("Name", "john")), BooleanClause.Occur.SHOULD);
        query.Add(new TermQuery(new Term("Location", "NY")), BooleanClause.Occur.SHOULD);

        IndexReader reader = writer.GetReader();

        IndexSearcher searcher = new IndexSearcher(reader);
        var hits = searcher.Search(query, null, 10);

        for (int i = 0; i < hits.totalHits; i++)
        {
            Document doc = searcher.Doc(hits.scoreDocs[i].doc);
            var explain = searcher.Explain(query, hits.scoreDocs[i].doc);
            Console.WriteLine("{0} - {1} - {2}", hits.scoreDocs[i].score, doc.ToString(), explain.ToString());
        }
    }

    private static void AddDocument(IndexWriter writer, string name, string address)
    {
        Document objDocument = new Document();
        Field objName = new Field("Name", name, Field.Store.YES, Field.Index.ANALYZED);
        Field objLocation = new Field("Location", address, Field.Store.YES, Field.Index.NOT_ANALYZED);
        objLocation.SetBoost(2f);
        objDocument.Add(objName);
        objDocument.Add(objLocation);
        writer.AddDocument(objDocument);
    }
}

此代码确实按您希望的顺序返回结果。实际上,如果您排除了提升,它将按此顺序返回它们。我不是Lucene得分的专家,但我相信这是因为你将“NY”完全匹配为“XYZ,NY”,而“John”查询是部分匹配。您可以阅读通过Explain课程打印出的详细信息。

答案 1 :(得分:0)

您是否尝试过MultiFieldQueryParser?