如何使用Lucene 4.3获取更多文档,如当前文档?

时间:2013-06-09 20:37:09

标签: lucene

使用Lucene 4.3.0

Lucene的新手。我想获得更多文档,如当前选定的文档。根据我的研究,旧版本的Lucene有一个MoreLikeThis(这与我想要的行为类似)。

我把一些玩具代码放在一起测试选项。我已完成索引并在索引中包含TermVector。

代码例外

QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "body", this.analyzer);
Query query = null ;
try {
    query = parser.parse(searchterm);
    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    simpleresult = simpleresult + "HITS = " + hits.length + "\n";
    IndexReader ir = isearcher.getIndexReader() ; //2013-06-09 testing
    simpleresult = simpleresult + "Total Indexed Num Docs = " + ir.numDocs() + "\n" ;

    //Loop through results and construct simple string representation
    for (int i = 0; i < hits.length; i++) {
        Document hitdoc = isearcher.doc(hits[i].doc);
        float docscore = hits[i].score ;

        simpleresult = simpleresult + "=======" + (i+1) + "=======\n" ;
        simpleresult = simpleresult + "DOCDBID: " + hitdoc.get("dbid") + "\n" ;
        simpleresult = simpleresult + "Score: " + docscore + "\n" ;

        simpleresult = simpleresult + "File: " + hitdoc.get("filename") + "\n" ;
        simpleresult = simpleresult + hitdoc.get("body") ;
        simpleresult = simpleresult + "\n--------META--------\n" ;
        simpleresult = simpleresult + hitdoc.get("meta") ;
        simpleresult = simpleresult + "==============\n" ;

        //TESTING 2013-06-09
        //Trying to mimic similar documents
        //Feed the text contents of the current document back into nother query?????
        query = parser.parse(hitdoc.get("body"));
        ScoreDoc[] simhits = isearcher.search(query, null, 1000).scoreDocs;
        TopDocs top = isearcher.search(query, 10);
        simpleresult = simpleresult + "Similar Hits = " + simhits.length + "\n";
        simpleresult = simpleresult + "Top Hits MaxScore= " + top.getMaxScore() + "\n"; //why does this score differ from the above scores???????
        simpleresult = simpleresult + "Top Hits = " + top.totalHits + "\n";

      }
} catch (ParseException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

this.close() ;

同样,这是一个玩具示例的摘录,所以我可以更好地学习Lucene。它本质上只是执行一个简单的查询,显示每个结果(在GUI中),然后尝试使用每个文档重新查询,以查看任何类似的文档来模仿MoreLikeThis。我想要做的是获取类似于文档的文档。我

在Lucene 4 +中,ty示例是正确的方法吗?

1 个答案:

答案 0 :(得分:1)

MoreLikeThis仍然存在。它在lucene-queries jar中。我认为使用它应该很简单:

MoreLikeThis mlt = MoreLikeThis(ir);
Query likeQuery = mlt.like(hits[i].doc);
TopDocs results = isearcher.search(likeQuery);
//etc