Question

出于某种原因，我找不到3552项有效索引的任何结果。

请在运行时查看下面的代码，然后是程序的控制台输出。 3552 是索引文档的数量。 /c:/test/stuff.txt 是从文档5中检索的正确索引路径作为测试。底部的所有文本都是测试文件的全文（在XML类型输出中）。我错过了什么，我的简单查询不会产生结果？

也许我的WildcardQuery语法不好？我认为这将是低效的（由于开头和结尾的通配符），但它至少会从索引中返回此文档...

import java.io.File;
import java.io.IOException;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Fieldable;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.store.FSDirectory;


public class Searcher
{

    /**
    * @param args
    * @throws IOException 
    * @throws CorruptIndexException 
    */
    public static void main(String[] args) throws CorruptIndexException, IOException
    {

        System.out.println("Begin searching test...");

        IndexSearcher searcher = new IndexSearcher(FSDirectory.open(new File(args[0])));

        // termContainsWildcard is shown to be true here when debugging
        // numberOfTerms is 0
        WildcardQuery query = new WildcardQuery(new Term("contents", "*stuff*"));

        System.out.println("Query field is: " + query.getTerm().field());
        System.out.println("Query field contents is: " + query.getTerm().text());

        TopDocs results = searcher.search(query, 5000);

        // no results returned :(
        System.out.println("Total results from index " + args[0] + ": " + results.totalHits);

        for (ScoreDoc sd : results.scoreDocs)
        {
            System.out.println("Document matched. Number: " + sd.doc);
        }

        System.out.println();

        System.out.println("Begin reading test...");

        // now read from the index to see if I am crazy
        IndexReader reader = IndexReader.open(FSDirectory.open(new File(args[0])));

        // correctly shows the number of documents in the local index
        System.out.println("Number of indexed documents: " + reader.numDocs());

        // pick out a random, small document and check its fields
        Document d = reader.document(5);

        for (Fieldable f : d.getFields())
        {
            System.out.println("Field name is: " + f.name());
            System.out.println(new String(f.getBinaryValue()));
        }
    }
}

跑步时的控制台输出

开始搜索测试...
查询字段为：内容
查询字段内容为：*stuff*
索引C的总结果：\ INDEX2：0

开始阅读测试...
索引文件数量：3552
字段名称是：路径
/c:/test/stuff.txt
字段名称是：内容
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Content-Length" content="8"/>
<meta name="Content-Encoding" content="UTF-8"/>
<meta name="Content-Type" content="text/plain"/>
<meta name="resourceName" content="stuff.txt"/>
<title/>
</head>
<body>
<p>stuff 
</p>
</body>
</html>

Answer 1

您可以尝试使用Luke运行查询＆amp;测试一些不同的查询。您还可以使用Luke浏览索引条款，这可能会为您提供有关正在发生的事情的线索。您用于索引文档的代码也可能会提供一些提示：例如，您的字段是否已编入索引？您正在从内容中获取二进制值，这可能意味着它从未被标记化并因此被编入索引。

Answer 2

默认情况下，Lucene中禁用前缀通配符查询（带前导*的通配符查询）。有关详细信息，请参阅Lucene FAQ。如果要启用前缀通配符查询，请尝试：

QueryParser.setAllowLeadingWildcard(true)

无法使用PhraseQuery或WildcardQuery从有效索引中找到任何结果？

2 个答案: