HTMLStripCharFilter无法在Custom Analyzer的createComponent实现中使用

时间:2016-05-18 11:33:10

标签: java lucene

我在自定义分析器实现的createComponents实现中使用了HTMLStripCharFilter,但HTML并未从内容中剥离。请在下面找到代码。

@Override
    protected TokenStreamComponents createComponents(String fieldName) 
    {
        StandardTokenizer source = new StandardTokenizer();
        source.setReader(mStripHTML ? new HTMLStripCharFilter(getReader()) : getReader());
        source.setMaxTokenLength(maxTokenLength);
        TokenStream result = new StandardFilter(source);
        result = new LowerCaseFilter(result);
        return new TokenStreamComponents(source, result);
    }

2 个答案:

答案 0 :(得分:1)

您的CharFilter不应该在您的createComponents方法中定义,它应该在initReader中:

@Override
protected Reader initReader(String fieldName, Reader reader) {
    return mStripHTML ? new HTMLStripCharFilter(reader) : reader;
}

@Override
protected TokenStreamComponents createComponents(String fieldName) 
{
    StandardTokenizer source = new StandardTokenizer();
    source.setMaxTokenLength(maxTokenLength);
    TokenStream result = new StandardFilter(source);
    result = new LowerCaseFilter(result);
    return new TokenStreamComponents(source, result);
}

答案 1 :(得分:1)