使用Lucene模糊搜索时,如何获得匹配的模糊项及其偏移?
IndexSearcher mem = ....(some standard code)
QueryParser parser = new QueryParser(Version.LUCENE_30, CONTENT_FIELD, analyzer);
TopDocs topDocs = mem.search(parser.parse("wuzzy~"), 1);
// the ~ triggers the fuzzy search as per "Lucene In Action"
模糊搜索工作正常。如果文档包含术语“模糊”或“luzzy”,则匹配。如何获得匹配的术语以及它们的偏移量?
我已确保所有CONTENT_FIELD都添加了termVectorStored,其中包含位置和偏移量。
答案 0 :(得分:6)
没有直接的方法这样做,但是我重新考虑了Jared的建议并且能够使解决方案有效。
我在这里记录这个以防其他人有同样的问题。
创建一个实现org.apache.lucene.search.highlight.Formatter
public class HitPositionCollector implements Formatter
{
// MatchOffset is a simple DTO
private List<MatchOffset> matchList;
public HitPositionCollector(
{
matchList = new ArrayList<MatchOffset>();
}
// this ie where the term start and end offset as well as the actual term is captured
@Override
public String highlightTerm(String originalText, TokenGroup tokenGroup)
{
if (tokenGroup.getTotalScore() <= 0)
{
}
else
{
MatchOffset mo= new MatchOffset(tokenGroup.getToken(0).toString(), tokenGroup.getStartOffset(),tokenGroup.getEndOffset());
getMatchList().add(mo);
}
return originalText;
}
/**
* @return the matchList
*/
public List<MatchOffset> getMatchList()
{
return matchList;
}
}
主要代码
public void testHitsWithHitPositionCollector() throws Exception
{
System.out.println(" .... testHitsWithHitPositionCollector");
String fuzzyStr = "bro*";
QueryParser parser = new QueryParser(Version.LUCENE_30, "f", analyzer);
Query fzyQry = parser.parse(fuzzyStr);
TopDocs hits = searcher.search(fzyQry, 10);
QueryScorer scorer = new QueryScorer(fzyQry, "f");
HitPositionCollector myFormatter= new HitPositionCollector();
//Highlighter(Formatter formatter, Scorer fragmentScorer)
Highlighter highlighter = new Highlighter(myFormatter,scorer);
highlighter.setTextFragmenter(
new SimpleSpanFragmenter(scorer)
);
Analyzer analyzer2 = new SimpleAnalyzer();
int loopIndex=0;
//for (ScoreDoc sd : hits.scoreDocs) {
Document doc = searcher.doc( hits.scoreDocs[0].doc);
String title = doc.get("f");
TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
hits.scoreDocs[0].doc,
"f",
doc,
analyzer2);
String fragment = highlighter.getBestFragment(stream, title);
System.out.println(fragment);
assertEquals("the quick brown fox jumps over the lazy dog", fragment);
MatchOffset mo= myFormatter.getMatchList().get(loopIndex++);
assertTrue(mo.getEndPos()==15);
assertTrue(mo.getStartPos()==10);
assertTrue(mo.getToken().equals("brown"));
}