我想捕获字符串中与特定正则表达式匹配的所有匹配项。我正在使用DataWeave 2.0(这意味着Mule Runtime 4.3,就我而言,就是Anypoint Studio 7.5)
我尝试从DataWeave核心库中使用scan()和match(),但我无法完全获得所需的结果。
这是我尝试过的一些事情:
%dw 2.0
output application/json
// sample input with hashtag keywords
var microList = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
withscan: microList scan /(#[^\s]*).*/,
sanitized: microList replace /\n/
with ' ',
sani_match: microList replace /\n/
with ' ' match /.*(#[^\s]*).*/, // gives full string and last match
sani_scan: microList replace /\n/
with ' ' scan /.*(#[^\s]*).*/ // gives array of arrays, string and last match
}
以下是各个结果:
{
"withscan": [
[
"#downtownmalls now!",
"#downtownmalls"
],
[
"#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"#shoplocal"
]
],
"sanitized": "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"sani_match": [
"Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"#downtowndancehalls"
],
"sani_scan": [
[
"Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"#downtowndancehalls"
]
]
}
在第一个示例中,解析器似乎正在执行行处理。因此,结果数组中的每一行都有一个元素。元素由完全匹配的部分和使用模式第一次出现的带标记的部分组成。
剥离换行符后,第三个示例(sani_match)给了我一个具有完全匹配的部分和带标签的部分的数组,这是该行上最后一次出现该模式。
最终模式(sani_scan)给出相似的结果,唯一的不同是结果被嵌入为数组中的元素。
我想要的只是一个所有出现的指定模式的数组。
答案 0 :(得分:3)
如果要捕获字符串中与特定正则表达式匹配的所有匹配项,我发现魔术词是“重叠匹配项”。
如果您真正想要从字符串中获取哈希标签,只需使用Valdi_Bo解决方案
要在Java中启用单行标记,您需要在开头添加(?s)
。
脚本:
%dw 2.0
output application/json
var str = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
// (?s) is the single-line modifier
// (?=(X)). enable overlapping matches
matchUntilEnd: str scan(/(?s)(?=(#([^\s]*).*))./) map $[1],
justTags: str scan(/(?s)#([^\s]*)/) map $[1],
Valdi_BoSolutionWithGroups: str scan(/#([\S]+)/) map $[1]
}
输出:
{
"matchUntilEnd": [
"#downtownmalls now!\n#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"#giveaway @barry sent you. #downtowndancehalls",
"#downtowndancehalls"
],
"justTags": [
"downtownmalls",
"shoplocal",
"giveaway",
"downtowndancehalls"
],
"Valdi_BoSolutionWithGroups": [
"downtownmalls",
"shoplocal",
"giveaway",
"downtowndancehalls"
]
}
答案 1 :(得分:1)
如果要匹配所有“单词”(实际上是非空白字符) 与#一起使用类似的模式:
@app.route('/slitter_results/<int: param_int>', methods=['GET', 'POST'])
@login_required
def slitter_results(param_int)
myproblem = param_int
flash(myproblem)
return "Yay" #just so you don't get an error, you can render or redirect to anywhere from here!
即:
#[\S]+
-代表自己,#
-非空白字符的非空序列。我认为,您可以在不捕获小组的情况下完成这项工作。
另一个提示是在模式中使用[\S]+
时要非常小心,因为
可能匹配得太少或太多。
在您的第一个示例( withscan )中,模式“ consumes”尾随.*
当前行的其余部分(最多一个换行符(不包括),以点表示)
与换行符不匹配)。
因此,如果此行的其余部分包含另一个“#...”片段,则没有机会
与您的捕获小组相匹配。
要捕获所有{strong> 个.*
字符串,通常应
将 global 选项传递给正则表达式处理器,但也许 DataWeave 使用此选项
默认情况下为选项(我不懂此语言)。
还请参见https://regex101.com/r/NPiMok/1上的工作示例 (一个方便的正则表达式测试站点)。