我使用以下设置拆分我的索引中的字符串。
{
"settings": {
"analysis": {
"filter": {
"filter_stop_word": {
"type": "stop"
},
"custom_unique": {
"type": "unique"
},
"custom_shingle": {
"type": "shingle",
"token_separator": "",
"max_shingle_size": "3",
"filler_token": ""
},
"filter_word_delimiter": {
"type": "word_delimiter"
}
},
"analyzer": {
"en_us": {
"filter": [
"filter_stop_word",
"filter_word_delimiter",
"custom_shingle",
"lowercase",
"unique"
],
"tokenizer": "standard"
}
}
}
}
}
输入:“Treeviewcontrol是工具之一”
如果我将上述输入提供给我的设置,它将产生以下输出:
[tree,treeview,treeviewcontrol,view,viewcontrol,Viewcontrolone,controlone,tool]
但我的要求输出如下 - 树, 视图, 控制, 树视图, viewcontrol, 一, 工具,
请勿在空格标记后加入。 任何人帮我?
答案 0 :(得分:0)
使用驼峰案例标记器,您可以根据案例 -
打破标记curl -XPUT localhost:9200/test/ -d '{
"settings" : {
"analysis" : {
"filter" : {
"camelFilter" : {
"type" : "pattern_capture",
"preserve_original" : 0,
"patterns" : [
"(\\p{Ll}+|\\p{Lu}\\p{Ll}+|\\p{Lu}+)",
"(\\d+)"
]
}
},
"analyzer" : {
"camel" : {
"tokenizer" : "pattern",
"filter" : [ "camelFilter", "lowercase" ]
}
}
}
}
}'
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=camel' -d 'qboxElasticsearchServiceProvider'
{
"tokens" : [ {
"token" : "qbox",
"start_offset" : 0,
"end_offset" : 32,
"type" : "word",
"position" : 1
}, {
"token" : "elasticsearch",
"start_offset" : 0,
"end_offset" : 32,
"type" : "word",
"position" : 1
}, {
"token" : "service",
"start_offset" : 0,
"end_offset" : 32,
"type" : "word",
"position" : 1
}, {
"token" : "provider",
"start_offset" : 0,
"end_offset" : 32,
"type" : "word",
"position" : 1
} ]
}