这是我的正则表达式......
^ +|( +; +)| +$
以下是带有测试字符串的正则表达式的屏幕截图
我使用截图来突出显示空格......
我想要做的就是像这样格式化字符串
Trimester 1;Trimester 2;Trimester 3
所以我想
这是我的自定义分析器...
"analysis": {
"analyzer": {
"semi_colon_analyzer": {
"tokenizer": "my_tokenizer"
},
"comma_analyzer": {
"type": "pattern",
"pattern": ",",
"lowercase": false
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": "( +; +)",
"replacement": "$1;"
}
}
}
这适用于regex101.com,但在Elastic中不起作用。
有人可以帮助您了解如何在ElasticSearch中实现此Regex吗?
由于
修改
_analyze?analyzer = semi_colon_analyzer
的输出{
"tokens": [
{
"token": "Trimester",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "1",
"start_offset": 10,
"end_offset": 11,
"type": "<NUM>",
"position": 1
},
{
"token": "Trimester",
"start_offset": 13,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "2",
"start_offset": 23,
"end_offset": 24,
"type": "<NUM>",
"position": 3
},
{
"token": "Trimester",
"start_offset": 26,
"end_offset": 35,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "3",
"start_offset": 36,
"end_offset": 37,
"type": "<NUM>",
"position": 5
}
]
}
答案 0 :(得分:0)
我认为您需要使用char_filter。试试这个,
{
"analysis": {
"analyzer": {
"semi_colon_analyzer": {
"char_filter": "my_char_filter",
"tokenizer": "my_tokenizer",
"filter" : "trim"
},
"comma_analyzer": {
"type": "pattern",
"pattern": ",",
"lowercase": false
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "(\\s+;\\s+)",
"replacement": ";"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": ";"
}
}
}
}
如果您使用上面创建的分析器分析Trimester 1 ; Trimester 2
,您将获得:
{
"tokens": [
{
"token": "Trimester 1",
"start_offset": 0,
"end_offset": 12,
"type": "word",
"position": 0
},
{
"token": "trimester 2",
"start_offset": 19,
"end_offset": 33,
"type": "word",
"position": 1
}
]
}
答案 1 :(得分:0)
我已经通过对原始映射进行一些调整得到了解决方案......
"settings": {
"number_of_shards": "1",
"number_of_replicas": "0",
"analysis": {
"analyzer": {
"semi_colon_analyzer": {
"tokenizer": "my_tokenizer"
},
"comma_analyzer": {
"type": "pattern",
"pattern": ",",
"lowercase": false
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": "^ +|( *; *)| +$",
"replacement": "$1;"
}
}
}
},