我正在尝试实现通配符查询,但我被卡住了...任何人都可以帮助我吗?
如果用户正在搜索AB的实例,我想返回所有匹配正则表达式“A [/.-'+&,] {0,1} B [/.-'+& amp ;,] {0,1}“。我知道正则表达式不能使用,但我只是想表明预期的结果。
因此,搜索“AB”应返回如下结果:“ABC x”,“abc x”,“Abcdefg”,“A.b.c。”,“A-B-C”,“A B C d”,“表”。
我使用这个分析器创建了一个模型:
@Indexed
@AnalyzerDef(name = "abAnalyzer", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters =
{
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params =
{
@Parameter(name = "ignoreCase", value = "true")
})
})
public class Foo
{
...
@Field(index = Index.YES, analyze = Analyze.YES, store = Store.NO)
@Analyzer(definition = "abAnalyzer")
private String name;
...
}
我实现了一个如下所示的查询。我得到了所有预期的结果,除了“A.b.c”之类的结果。我究竟做错了什么?我在哪里误解了事情?
public List<Foo> getResults(final String searchName)
{
Session session = this.sessionFactory.openSession();
FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
BooleanQuery bQuery = new BooleanQuery();
Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("abAnalyzer");
QueryParser qp = new QueryParser(Version.LUCENE_36, "name", analyzer);
String cleanedText = qp.parse(searchName).toString("name");
String[] tokenized = cleanedText.split(""); // split on each character
QueryBuilder qBuilder = fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(Foo.class).get();
org.apache.lucene.search.Query query = qBuilder.keyword().wildcard().onField("name").matching("*" + cleanedText + "*").createQuery();
bQuery.add(query, BooleanClause.Occur.SHOULD);
query = qBuilder.keyword().wildcard().onField("name").matching("*" + createSearchString(cleanedText) + "*").createQuery();
bQuery.add(query, BooleanClause.Occur.SHOULD);
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(bQuery, Company.class);
return hibQuery.list();
}
....
private String createSearchString(final String name)
{
StringBuilder searchName = new StringBuilder("");
for (int i = 0; i < name.length(); i++)
{
if (searchName.length() > 0)
{
searchName.append("?");
}
searchName.append(name.charAt(i));
}
return searchName.toString();
}
我的代码基于这些资源:
答案 0 :(得分:0)
我认为我找到了一个解决方案...... Lucene索引是基于小写字符串生成的,并且索引中删除了“特殊”字符。
所以,我把模型改为:
@Indexed
@AnalyzerDef(name = "abAnalyzer", charFilters =
{
@CharFilterDef(factory = PatternReplaceCharFilterFactory.class, params =
{
@Parameter(name = "pattern", value = Company.PATTERN),
@Parameter(name = "replacement", value = Company.REPLACEMENT_PATTERN)
})
}, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters =
{
@TokenFilterDef(factory = StandardFilterFactory.class), @TokenFilterDef(factory = LowerCaseFilterFactory.class),
})
public class Foo
{
public static final String PATTERN = "(A-Z)*[\\/\\.\\-'+&, ](A-Z)*";
public static final String REPLACEMENT_PATTERN = "$1$2";
@Field(index = Index.YES, analyze = Analyze.YES, store = Store.NO)
@Analyzer(definition = "abAnalyzer")
private String name;
....
}
为了查询我实现了这个:
public List<Foo> getResults(final String searchName)
{
List<Foo> result = new ArrayList<>();
// remove "special chars from searchName"
String searchName = name.replaceAll(Company.PATTERN, Company.REPLACEMENT_PATTERN);
Session session = this.sessionFactory.openSession();
try
{
FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("abAnalyzer");
QueryParser qp = new QueryParser(Version.LUCENE_36, "name", analyzer);
String cleanedText = qp.parse(searchName).toString("name");
BooleanQuery bQuery = new BooleanQuery();
bQuery.add(new WildcardQuery(new Term("name", "*" + cleanedText + "*")), BooleanClause.Occur.SHOULD);
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(bQuery, Company.class);
result = hibQuery.list();
tx.commit();
}
catch (Exception e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
finally
{
session.close();
}
return result;
}