我已在ES群集上安装了针对Elasticsearch的智能中文分析,但我找不到有关如何指定正确分析器的文档。我想除了我需要设置一个标记器和一个指定停用词和限制器的过滤器......
例如在荷兰语中:
"dutch": {
"type": "custom",
"tokenizer": "uax_url_email",
"filter": ["lowercase", "asciifolding", "dutch_stemmer_filter", "dutch_stop_filter"]
}
with:
"dutch_stemmer_filter": {
"type": "stemmer",
"name": "dutch"
},
"dutch_stop_filter": {
"type": "stop",
"stopwords": ["_dutch_"]
}
如何为中文配置我的分析器?
答案 0 :(得分:7)
尝试使用某个索引(分析器是'smartcn',而tokenizer是'smartcn_tokenizer'):
PUT /test_chinese
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"default": {
"type": "smartcn"
}
}
}
}
}
}
GET /test_chinese/_analyze?text='叻出色'
它应该输出两个令牌(从plugin test classes测试):
{
"tokens": [
{
"token": "叻",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 2
},
{
"token": "出色",
"start_offset": 2,
"end_offset": 4,
"type": "word",
"position": 3
}
]
}