如何使lucene索引字段不区分大小写

时间:2014-04-02 12:13:22

标签: lucene

如何使lucene索引字段不区分大小写。 我的意思是有没有办法在查询中小写索引字段而不是值。

我无法将整个查询转换为小写,因为它会影响使用空格分析器的其他查询。

Query.extractterms() - >方法返回了一系列术语,但如果输入包含通配符,即*

,则它不起作用

我需要这个,因为我有小写索引字段.e.g

如果我的字段是带有“actor”的索引,我应该能够获得包含“Actor:abc”以及“ACTOR:abc”的查询的结果

有什么想法吗?

1 个答案:

答案 0 :(得分:0)

解决方案是创建自己的Analyzer并添加 LowerCaseFilter 指令。

以下是一个不区分大小写的自定义法语分析器示例:

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.core.StopFilter;
import org.apache.lucene.analysis.fr.FrenchAnalyzer;
import org.apache.lucene.analysis.fr.FrenchLightStemFilter;
import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.apache.lucene.analysis.util.ElisionFilter;
import org.apache.lucene.util.Version;

import java.io.Reader;

/**
 * Completes {@link org.apache.lucene.analysis.fr.FrenchAnalyzer} with accent management
 */
public class CustomFrenchAnalyzer extends Analyzer {

    /**
     * Lucene version
     */
    private final Version matchVersion;

    /**
     * Constructs a new analyzer
     * @param matchVersion compatibility version
     */
    public CustomFrenchAnalyzer(final Version matchVersion) {
        this.matchVersion = matchVersion;
    }

    @Override
    protected final TokenStreamComponents createComponents(final String s, final Reader reader) {
        final Tokenizer source = new StandardTokenizer(matchVersion, reader);
        TokenStream result = new StandardFilter(matchVersion, source);
        result = new ElisionFilter(result, FrenchAnalyzer.DEFAULT_ARTICLES);
        result = new StopFilter(matchVersion, result, FrenchAnalyzer.getDefaultStopSet());
        result = new ASCIIFoldingFilter(result);
        result = new LowerCaseFilter(matchVersion, result);
        result = new FrenchLightStemFilter(result);

        return new TokenStreamComponents(source, new LowerCaseFilter(matchVersion, result));
    }
}