我正在使用Apache Solr,我想搜索" B"我希望Solr回归" AB"," BA"," ABA"
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.addFilterQuery("color:*B*");
然而,这是一个例外,我该怎么办?
Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'color:*B*': '*' or '?' not allowed as first character in WildcardQuery
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:211)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:114)
... 17 more
Caused by: org.apache.lucene.queryParser.ParseException: '*' or '?' not allowed as first character in WildcardQuery
答案 0 :(得分:1)
当您说AB或ABA或BA时,这些是单个单位或每个单词。当你索引它们时,它们被存储在倒排索引中,即ABA或BA等。当你在倒排索引中搜索B时,它找不到。您需要做的是将您的字段标记为n-gram索引(而不是在schema.xml中将type =“text”或type =“string”标记为type =“NGram”),这将在顶部索引部分单词完整的话。完成N-Gram索引后,当您搜索B时,您将获得所有AB或BA或ABA。但请记住,N-Gram索引是空间/时间密集型的。
例如,假设您的字段名称为color,那么在schema.xml中:
<field name="color" type="nGram" indexed="true" stored="true" required="false" />
同时检查schema.xml中是否存在此部分XML(如果没有,则复制粘贴此内容):
<fieldType name="nGram" class="solr.TextField"
positionIncrementGap="100" stored="false" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<!-- potentially word delimiter, synonym filter, stop words,
NOT stemming -->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="1"
maxGramSize="15"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<!-- potentially word delimiter, synonym filter, stop words,
NOT stemming -->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>