使用lucene SynonymFilter时无法对单词进行标记

时间:2017-07-05 09:09:13

标签: java lucene

public class SynonymAnalyzer extends Analyzer {


    @Override
    protected TokenStreamComponents createComponents(String s, Reader reader) {
        SynonymMap synonymMap = null;
        SynonymMap.Builder builder=null;
        try {
            addTo(builder,new String[]{"dns"},new String[]{"domain name system"});
            synonymMap = builder.build();
        }catch (Exception e) {
            e.printStackTrace();
        }
        Tokenizer tokenizer = new StandardTokenizer(reader);
        TokenStream filter = new SynonymFilter(tokenizer, synonymMap, true);
        return new TokenStreamComponents(tokenizer, filter);
    }

     private void addTo(SynonymMap.Builder builder, String[] from, String[] to) {
         for (String input : from) {
             for (String output : to) {
                 builder.add(new CharsRef(input), new CharsRef(output), false);
             }
         }
     }
 }

如果我使用此SynonymAnalyzer,并搜索 dns已关闭,则查询形成 + n:域名系统+ n:是+ n:向下。域名系统不会被标记为单独的令牌,但我需要将其作为单独的令牌。

1 个答案:

答案 0 :(得分:1)

添加多字同义词时,您需要将单词与SynonymMap.WORD_SEPARATOR分开:

addTo(builder,new String[]{"dns"},new String[]{
    "domain" + SynonymMap.WORD_SEPARATOR
    + "name" + SynonymMap.WORD_SEPARATOR
    + "system"});

(顺便说一下,你所写的createComponents会抛出一个NPE。根据你所写的内容,我会假设这是一个错误的示例,而不是你的代码在生产中)< / p>