输入是一个人名列表,我想创建一个有点模糊的完全匹配。
索引文字为冯宝安,我的分析器位于下方
PUT trim
{
"settings": {
"index": {
"analysis": {
"filter": {
"word_joiner": {
"type": "shingle",
"output_unigrams": false,
"token_separator": "",
"output_unigrams_if_no_shingles": true,
"max_shingle_size": 5
}
},
"analyzer": {
"word_join_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"word_joiner"
]
}
},
"tokenizer": {}
}
}
}
}
它将生成三个令牌
{
"tokens": [
{
"token": "baoan",
"start_offset": 0,
"end_offset": 6,
"type": "shingle",
"position": 0
},
{
"token": "baoanfeng",
"start_offset": 0,
"end_offset": 11,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "anfeng",
"start_offset": 4,
"end_offset": 11,
"type": "shingle",
"position": 1
}
]
}
我只想要“宝安峰”,我不能使用“ min_shingle_size”,因为可以输入两个单词。
答案 0 :(得分:0)
如果您需要的是最长的带状疱疹,我不确定为什么要使用componentDidUpdate(prevProps) {
// Typical usage (don't forget to compare props):
if (this.state.model !== prevProps.formModel) {
this.state({ model : prevProps.formModel})
}
}
过滤器...
为什么不简单地使用带有模式过滤器的shingle
标记程序来删除所有非字符的字符呢?像这样:
keyword
然后对其进行测试:
PUT trim
{
"settings": {
"index": {
"analysis": {
"filter": {
"pattern": {
"type": "pattern_replace",
"pattern": "\\W+",
"replacement": ""
}
},
"analyzer": {
"word_join_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"pattern"
]
}
},
"tokenizer": {}
}
}
}
}
结果:
POST trim/_analyze
{
"analyzer": "word_join_analyzer",
"text": "Bao-An Feng"
}