Lucene端口3.6.2分析仪到Lucene 5.5.0

时间:2016-05-25 05:15:40

标签: java lucene

对于Lucene 3.6.2我有以下分析器:

public final class StandardAnalyzerV36 extends Analyzer {

    private Analyzer analyzer;

    public StandardAnalyzerV36() {
        analyzer = new StandardAnalyzer(Version.LUCENE_36);
    }

    public StandardAnalyzerV36(Set<?> stopWords) {
        analyzer = new StandardAnalyzer(Version.LUCENE_36, stopWords);
    }

    @Override
    public final TokenStream tokenStream(String fieldName, Reader reader) {
        return analyzer.tokenStream(fieldName, new HTMLStripCharFilter(CharReader.get(reader)));
    }

    @Override
    public final TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
        return analyzer.reusableTokenStream(fieldName, reader);
    }

}

你能帮我把它移到Analyzer Lucene 5.5.0上吗? Analyzer版本在新版本中已更改。

已更新

我已将此分析器重新实现为以下内容:

public final class StandardAnalyzerV36 extends Analyzer {

    public static final CharArraySet STOP_WORDS_SET = StopAnalyzer.ENGLISH_STOP_WORDS_SET;  

    @Override
    protected TokenStreamComponents createComponents(String fieldName) {

        final ClassicTokenizer src = new ClassicTokenizer();
        TokenStream tok = new StandardFilter(src);
        tok = new StopFilter(new LowerCaseFilter(tok), STOP_WORDS_SET);
        return new TokenStreamComponents(src, tok);
    }

    @Override
    protected Reader initReader(String fieldName, Reader reader) {
        return new HTMLStripCharFilter(reader);
    }

但是我的测试在接下来的电话中失败了:

tokens = LuceneUtils.tokenizeString(analyzer, "[{(RDBMS)}]");

public static List<String> tokenizeString(Analyzer analyzer, String string) {
        List<String> result = new ArrayList<String>();
        try {
            TokenStream stream = analyzer.tokenStream(null, new StringReader(string));
            stream.reset();
            while (stream.incrementToken()) {
                result.add(stream.getAttribute(CharTermAttribute.class).toString());
            }
        } catch (IOException e) {
            // not thrown b/c we're using a string reader...
            throw new RuntimeException(e);
        }
        return result;
    }

有以下例外:

java.lang.IllegalStateException: TokenStream contract violation: close() call missing
    at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)
    at org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:315)
    at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:143)

此代码有什么问题?

1 个答案:

答案 0 :(得分:0)

最后我开始工作了:

Prelude>