在lucene索引搜索中缺少命中

时间:2011-06-06 06:04:18

标签: java indexing full-text-search lucene

我索引一个用户必须能够搜索的大数据库概述(只是文本字段)(在indexFields方法下面)。以前的搜索是使用ILIKE查询在数据库中完成的,但速度很慢,所以现在搜索是在索引上完成的。 Hovewer,当我比较db查询的搜索结果,以及我从索引搜索得到的结果时,从索引搜索的结果总是少得多。 我不确定我在索引或搜索过程中是否犯了错误。对我来说,这里似乎都有意义。有什么想法吗?

这是代码。所有建议都赞赏!

 // INDEXING
StandardAnalyzer analyzer = new StandardAnalyzer(
                Version.LUCENE_CURRENT, stopSet); // stop set is empty
        IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true,
                IndexWriter.MaxFieldLength.UNLIMITED);

        indexFields(writer);
        writer.optimize();
        writer.commit();
        writer.close();
        analyzer.close();

private void indexFields(IndexWriter writer) {

    DetachedCriteria criteria = DetachedCriteria
            .forClass(Activit.class);

    int count = 0;
    int max = 50000;
    boolean existMoreToIndex = true;

    List<Activit> result = new ArrayList<Activit>();


    while (existMoreToIndex) {

        try {
            result = activitService.listPaged(count, max);
            if (result.size() < max)
                existMoreToIndex = false;

            if (result.size() == 0)
                return;

            for (Activit ao : result) {
                Document doc = new Document();
                doc.add(new Field("id", String.valueOf(ao.getId()),
                        Field.Store.YES, Field.Index.ANALYZED));
                if(ao.getActivitOwner()!=null)
                    doc.add(new Field("field1", ao.getActivityOwner(),Field.Store.YES, Field.Index.ANALYZED));
                if(ao.getActivitResponsible() != null)
                    doc.add(new Field("field2", ao.getActivityResponsible(), Field.Store.YES,Field.Index.ANALYZED));

                try {
                    writer.addDocument(doc);
                } catch (CorruptIndexException e) {
                    e.printStackTrace();

            }
            count += max;

 //SEARCH
    public List<Activit> searchActivitiesInIndex(String searchCriteria) {
    Set<String> stopSet = new HashSet<String>(); // empty because we do not    want to remove stop words
    Version version = Version.LUCENE_CURRENT;
    String[] fields = {
            "field1", "field2"};
    try {
        File tempFile = new File("C://testindex");
        Directory INDEX_DIR = new SimpleFSDirectory(tempFile);
        Searcher searcher = new IndexSearcher(INDEX_DIR, true);

        QueryParser parser = new MultiFieldQueryParser(version, fields, new StandardAnalyzer(
                version, stopSet));


        Query query = parser.parse(searchCriteria);

        TopDocs topDocs = searcher.search(query, 500);

        ScoreDoc[] hits = topDocs.scoreDocs;


        //here i always get smaller hits lenght

        searcher.close();
    } catch (Exception e) {
        e.printStackTrace();
    }


}

1 个答案:

答案 0 :(得分:1)

分析仪很可能正在做一些你不期望的事情。

使用Luke打开您的索引,您可以看到您的(已分析的)索引文档的外观,以及您解析的查询 - 应该让您看到出现了什么问题。

另外,您能举例说明searchCriteria吗?和相应的SQL查询?没有它,很难知道索引是否正确完成。您可能也不需要使用效率非常低的MultiFieldQueryParser