DataWeave 2.0匹配所有出现的正则表达式

时间:2020-05-19 02:26:45

标签: regex dataweave mulesoft

我想捕获字符串中与特定正则表达式匹配的所有匹配项。我正在使用DataWeave 2.0(这意味着Mule Runtime 4.3,就我而言,就是Anypoint Studio 7.5)

我尝试从DataWeave核心库中使用scan()和match(),但我无法完全获得所需的结果。

这是我尝试过的一些事情:

%dw 2.0
output application/json

// sample input with hashtag keywords
var microList = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
    withscan: microList scan /(#[^\s]*).*/,
    sanitized: microList replace /\n/ 
        with ' ',
    sani_match: microList replace /\n/ 
        with ' ' match /.*(#[^\s]*).*/, // gives full string and last match
    sani_scan: microList replace /\n/ 
        with ' ' scan /.*(#[^\s]*).*/   // gives array of arrays, string and last match
}

以下是各个结果:

{
  "withscan": [
    [
      "#downtownmalls now!",
      "#downtownmalls"
    ],
    [
      "#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
      "#shoplocal"
    ]
  ],
  "sanitized": "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
  "sani_match": [
    "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#downtowndancehalls"
  ],
  "sani_scan": [
    [
      "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
      "#downtowndancehalls"
    ]
  ]
}

在第一个示例中,解析器似乎正在执行行处理。因此,结果数组中的每一行都有一个元素。元素由完全匹配的部分和使用模式第一次出现的带标记的部分组成。

剥离换行符后,第三个示例(sani_match)给了我一个具有完全匹配的部分和带标签的部分的数组,这是该行上最后一次出现该模式。

最终模式(sani_scan)给出相似的结果,唯一的不同是结果被嵌入为数组中的元素。

我想要的只是一个所有出现的指定模式的数组。

2 个答案:

答案 0 :(得分:3)

如果要捕获字符串中与特定正则表达式匹配的所有匹配项,我发现魔术词是“重叠匹配项”。

如果您真正想要从字符串中获取哈希标签,只需使用Valdi_Bo解决方案

要在Java中启用单行标记,您需要在开头添加(?s)

脚本:

%dw 2.0
output application/json

var str = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
    // (?s) is the single-line modifier
    // (?=(X)). enable overlapping matches
    matchUntilEnd: str scan(/(?s)(?=(#([^\s]*).*))./) map $[1],
    justTags: str scan(/(?s)#([^\s]*)/) map $[1],
    Valdi_BoSolutionWithGroups: str scan(/#([\S]+)/) map $[1]
}

输出:

{
  "matchUntilEnd": [
    "#downtownmalls now!\n#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#giveaway @barry sent you. #downtowndancehalls",
    "#downtowndancehalls"
  ],
  "justTags": [
    "downtownmalls",
    "shoplocal",
    "giveaway",
    "downtowndancehalls"
  ],
  "Valdi_BoSolutionWithGroups": [
    "downtownmalls",
    "shoplocal",
    "giveaway",
    "downtowndancehalls"
  ]
}

答案 1 :(得分:1)

如果要匹配所有“单词”(实际上是非空白字符) 与一起使用类似的模式:

@app.route('/slitter_results/<int: param_int>', methods=['GET', 'POST'])
@login_required
def slitter_results(param_int)
    myproblem = param_int
    flash(myproblem)
    return "Yay" #just so you don't get an error, you can render or redirect to anywhere from here!

即:

  • #[\S]+ -代表自己,
  • #-非空白字符的非空序列。

我认为,您可以在不捕获小组的情况下完成这项工作。

另一个提示是在模式中使用[\S]+时要非常小心,因为 可能匹配得太少或太多。

在您的第一个示例( withscan )中,模式“ consumes”尾随.* 当前行的其余部分(最多一个换行符(不包括),以点表示) 与换行符不匹配)。 因此,如果此行的其余部分包含另一个“#...”片段,则没有机会 与您的捕获小组相匹配。

要捕获所有{strong> 个.*字符串,通常应 将 global 选项传递给正则表达式处理器,但也许 DataWeave 使用此选项 默认情况下为选项(我不懂此语言)。

还请参见https://regex101.com/r/NPiMok/1上的工作示例 (一个方便的正则表达式测试站点)。