Apache Lucene 5.0.1 FuzzyQuery搜索问题

时间:2015-05-18 14:57:02

标签: java apache lucene fuzzy-search

我在Apache Lucene 5.0.1中实现FuzzyQuery搜索时遇到问题。

我能够实现常规区分大小写的查询(使用修改后的StandardAnalyzer),该查询读取包含各种单词的文本文件,并返回与硬编码字符串相比的匹配/匹配数。我无法将其应用于FuzzyQuery。这是程序,假设您已导入Lucene包/库:

public class FuzzyMatcherTest {

private final static String indexPath = "C:"+File.separator+"Users"+File.separator+"username"+File.separator+"Documents"+File.separator+"ComparisonDocs";
private final static String filePath1 = "C:"+File.separator+"Users"+File.separator+"username"+File.separator+"Documents"+File.separator+"ComparisonDocs"+File.separator+"test.txt";

public static void main(String[] args) throws CorruptIndexException, LockObtainFailedException, IOException, ParseException {

    System.out.println("Your Stuff:");
    createIndex();
    searchIndex("Test");
    cleanDirectory();

    }

public static void createIndex() throws CorruptIndexException, LockObtainFailedException, IOException {
    Analyzer analyzer = new MyStandardAnalyzer();
    IndexWriterConfig indexAnalyzer = new IndexWriterConfig(analyzer);
    IndexWriter indexWriter = new IndexWriter(FSDirectory.open(new File(indexPath).toPath()), indexAnalyzer);

    File file = new File(filePath1);
    Document document = new Document();

    InputStream fis = new FileInputStream(file);
    InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
    BufferedReader br = new BufferedReader(isr);
    String line;
    while((line = br.readLine()) != null){
        System.out.println(line);
        StringField stringField = new StringField("field", line, Field.Store.YES);
        document.add(stringField);  
    }
    indexWriter.addDocument(document);
    br.close();
    indexWriter.close();
}

public static void searchIndex(String searchString) throws IOException, ParseException {
    System.out.println("Searching for '" + searchString + "'");

    IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(FSDirectory.open(new File(indexPath).toPath())));

    FuzzyQuery myFuzzyQuery = new FuzzyQuery(new Term(searchString, "field"),2);
    try{
        TopDocs top = indexSearcher.search(myFuzzyQuery,100);
        System.out.println("Number of hits in TopDoc array: " + top.totalHits);

    }
    catch(Exception e)
    {
        System.out.println("Oops");
    }

}

}

此代码返回0次点击。 (在搜索应该有多个匹配的文本文件时,有些匹配完全匹配。)

1 个答案:

答案 0 :(得分:0)

首先,你的Term落后了。应该是:

FuzzyQuery myFuzzyQuery = new FuzzyQuery(new Term("field", searchString),2);

其次,由于您使用的是StandardAnalyzer(或类似的内容),因此索引中的所有内容都是小写的。

但是,您的查询不会通过分析器运行。如果要对查询应用分析,可以使用queryparser(注意:模糊查询实际上并未进行分析,但为您负责小写):

QueryParser parser = new QueryParser("field", new MyStandardAnalyzer());
Query parsedQuery = parser.parse(searchString + "~");

否则,您可能只需要自己小写这些术语:

FuzzyQuery myFuzzyQuery = new FuzzyQuery(new Term("field", searchString.toLowerCase()),2);