我在Apache Lucene 5.0.1中实现FuzzyQuery搜索时遇到问题。
我能够实现常规区分大小写的查询(使用修改后的StandardAnalyzer),该查询读取包含各种单词的文本文件,并返回与硬编码字符串相比的匹配/匹配数。我无法将其应用于FuzzyQuery。这是程序,假设您已导入Lucene包/库:
public class FuzzyMatcherTest {
private final static String indexPath = "C:"+File.separator+"Users"+File.separator+"username"+File.separator+"Documents"+File.separator+"ComparisonDocs";
private final static String filePath1 = "C:"+File.separator+"Users"+File.separator+"username"+File.separator+"Documents"+File.separator+"ComparisonDocs"+File.separator+"test.txt";
public static void main(String[] args) throws CorruptIndexException, LockObtainFailedException, IOException, ParseException {
System.out.println("Your Stuff:");
createIndex();
searchIndex("Test");
cleanDirectory();
}
public static void createIndex() throws CorruptIndexException, LockObtainFailedException, IOException {
Analyzer analyzer = new MyStandardAnalyzer();
IndexWriterConfig indexAnalyzer = new IndexWriterConfig(analyzer);
IndexWriter indexWriter = new IndexWriter(FSDirectory.open(new File(indexPath).toPath()), indexAnalyzer);
File file = new File(filePath1);
Document document = new Document();
InputStream fis = new FileInputStream(file);
InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
BufferedReader br = new BufferedReader(isr);
String line;
while((line = br.readLine()) != null){
System.out.println(line);
StringField stringField = new StringField("field", line, Field.Store.YES);
document.add(stringField);
}
indexWriter.addDocument(document);
br.close();
indexWriter.close();
}
public static void searchIndex(String searchString) throws IOException, ParseException {
System.out.println("Searching for '" + searchString + "'");
IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(FSDirectory.open(new File(indexPath).toPath())));
FuzzyQuery myFuzzyQuery = new FuzzyQuery(new Term(searchString, "field"),2);
try{
TopDocs top = indexSearcher.search(myFuzzyQuery,100);
System.out.println("Number of hits in TopDoc array: " + top.totalHits);
}
catch(Exception e)
{
System.out.println("Oops");
}
}
}
此代码返回0次点击。 (在搜索应该有多个匹配的文本文件时,有些匹配完全匹配。)
答案 0 :(得分:0)
首先,你的Term
落后了。应该是:
FuzzyQuery myFuzzyQuery = new FuzzyQuery(new Term("field", searchString),2);
其次,由于您使用的是StandardAnalyzer
(或类似的内容),因此索引中的所有内容都是小写的。
但是,您的查询不会通过分析器运行。如果要对查询应用分析,可以使用queryparser(注意:模糊查询实际上并未进行分析,但将为您负责小写):
QueryParser parser = new QueryParser("field", new MyStandardAnalyzer());
Query parsedQuery = parser.parse(searchString + "~");
否则,您可能只需要自己小写这些术语:
FuzzyQuery myFuzzyQuery = new FuzzyQuery(new Term("field", searchString.toLowerCase()),2);