Which autocomplete index-time analyzer should I use for non-whitespaced user names?

时间:2019-04-17 00:48:23

标签: elasticsearch

I'm a relative neophyte when it comes to the usages of ElasticSearch.

Currently, I'm trying to set up autocomplete functionality for the searching of usernames in our app, but I ran into an issue with the completion suggester not giving me the expected results. Here is how I mapped the properties initially.

        'properties' : {  
          'username' : {  
            'type' : 'keyword',  
            'fields' : {  
              'text' : {  
                'type' : 'text'  
              },  
              'suggest' : {  
                'type' : 'completion'  
              }  
            }  
          }
        }

The usernames will be limited to capitalized alphanumeric characters only. (0-9, A-Z, no whitespace)

The problem I was running into was that the exact match TIM was being weighted the same as 3TIM, due to the default simple analyzer. But looking at the standard analyzer, at least according to this seems like only the words between the whitespaces are tokenized.

Can I expect my intended behavior by specifying the standard tokenizer on the username.suggest field? Or am I trying to do this completely wrong and I should be using a totally different analyzer and edge_ngrams instead?

1 个答案:

答案 0 :(得分:0)

使用completion字段类型时,通常不需要使用任何edge-ngram,这就是completion字段在内部执行的操作。

不过,您是对的,默认用于simple类型的completion分析器会去除所有数字,即,只要遇到非字母的字符,它就会拆分输入。因此,只有在您的数据仅包含[a-zA-Z]的情况下才有效,而事实并非如此。

由于您的输入仅包含一个令牌用户名,因此您可以使用standard分析器。如果存在多个令牌,由于停用词令牌过滤器,我不鼓励您使用它,但是由于并非如此,因此您可以放心使用它。

如果您必须在多令牌输入上使用完成功能,通常最有效的方法是创建一个customwhitespace令牌生成器和{{1 }}令牌过滤器,如下所示:

classic

您的行驶里程可能会有所不同,但是以上分析器是您可以信赖的良好基础。