如何使lucene索引字段不区分大小写。 我的意思是有没有办法在查询中小写索引字段而不是值。
我无法将整个查询转换为小写,因为它会影响使用空格分析器的其他查询。
Query.extractterms() - >方法返回了一系列术语,但如果输入包含通配符,即*
,则它不起作用我需要这个,因为我有小写索引字段.e.g
如果我的字段是带有“actor”的索引,我应该能够获得包含“Actor:abc”以及“ACTOR:abc”的查询的结果
有什么想法吗?
答案 0 :(得分:0)
解决方案是创建自己的Analyzer并添加 LowerCaseFilter 指令。
以下是一个不区分大小写的自定义法语分析器示例:
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.core.StopFilter;
import org.apache.lucene.analysis.fr.FrenchAnalyzer;
import org.apache.lucene.analysis.fr.FrenchLightStemFilter;
import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.apache.lucene.analysis.util.ElisionFilter;
import org.apache.lucene.util.Version;
import java.io.Reader;
/**
* Completes {@link org.apache.lucene.analysis.fr.FrenchAnalyzer} with accent management
*/
public class CustomFrenchAnalyzer extends Analyzer {
/**
* Lucene version
*/
private final Version matchVersion;
/**
* Constructs a new analyzer
* @param matchVersion compatibility version
*/
public CustomFrenchAnalyzer(final Version matchVersion) {
this.matchVersion = matchVersion;
}
@Override
protected final TokenStreamComponents createComponents(final String s, final Reader reader) {
final Tokenizer source = new StandardTokenizer(matchVersion, reader);
TokenStream result = new StandardFilter(matchVersion, source);
result = new ElisionFilter(result, FrenchAnalyzer.DEFAULT_ARTICLES);
result = new StopFilter(matchVersion, result, FrenchAnalyzer.getDefaultStopSet());
result = new ASCIIFoldingFilter(result);
result = new LowerCaseFilter(matchVersion, result);
result = new FrenchLightStemFilter(result);
return new TokenStreamComponents(source, new LowerCaseFilter(matchVersion, result));
}
}