Question

我想捕获字符串中与特定正则表达式匹配的所有匹配项。我正在使用DataWeave 2.0（这意味着Mule Runtime 4.3，就我而言，就是Anypoint Studio 7.5）

我尝试从DataWeave核心库中使用scan（）和match（），但我无法完全获得所需的结果。

这是我尝试过的一些事情：

%dw 2.0
output application/json

// sample input with hashtag keywords
var microList = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
    withscan: microList scan /(#[^\s]*).*/,
    sanitized: microList replace /\n/ 
        with ' ',
    sani_match: microList replace /\n/ 
        with ' ' match /.*(#[^\s]*).*/, // gives full string and last match
    sani_scan: microList replace /\n/ 
        with ' ' scan /.*(#[^\s]*).*/   // gives array of arrays, string and last match
}

以下是各个结果：

{
  "withscan": [
    [
      "#downtownmalls now!",
      "#downtownmalls"
    ],
    [
      "#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
      "#shoplocal"
    ]
  ],
  "sanitized": "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
  "sani_match": [
    "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#downtowndancehalls"
  ],
  "sani_scan": [
    [
      "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
      "#downtowndancehalls"
    ]
  ]
}

在第一个示例中，解析器似乎正在执行行处理。因此，结果数组中的每一行都有一个元素。元素由完全匹配的部分和使用模式第一次出现的带标记的部分组成。

剥离换行符后，第三个示例（sani_match）给了我一个具有完全匹配的部分和带标签的部分的数组，这是该行上最后一次出现该模式。

最终模式（sani_scan）给出相似的结果，唯一的不同是结果被嵌入为数组中的元素。

我想要的只是一个所有出现的指定模式的数组。

Answer 1

如果要捕获字符串中与特定正则表达式匹配的所有匹配项，我发现魔术词是“重叠匹配项”。

如果您真正想要从字符串中获取哈希标签，只需使用Valdi_Bo解决方案

要在Java中启用单行标记，您需要在开头添加(?s)。

脚本：

%dw 2.0
output application/json

var str = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
    // (?s) is the single-line modifier
    // (?=(X)). enable overlapping matches
    matchUntilEnd: str scan(/(?s)(?=(#([^\s]*).*))./) map $[1],
    justTags: str scan(/(?s)#([^\s]*)/) map $[1],
    Valdi_BoSolutionWithGroups: str scan(/#([\S]+)/) map $[1]
}

输出：

{
  "matchUntilEnd": [
    "#downtownmalls now!\n#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#giveaway @barry sent you. #downtowndancehalls",
    "#downtowndancehalls"
  ],
  "justTags": [
    "downtownmalls",
    "shoplocal",
    "giveaway",
    "downtowndancehalls"
  ],
  "Valdi_BoSolutionWithGroups": [
    "downtownmalls",
    "shoplocal",
    "giveaway",
    "downtowndancehalls"
  ]
}

Answer 2

如果要匹配所有“单词”（实际上是非空白字符）与＃一起使用类似的模式：

@app.route('/slitter_results/<int: param_int>', methods=['GET', 'POST'])
@login_required
def slitter_results(param_int)
    myproblem = param_int
    flash(myproblem)
    return "Yay" #just so you don't get an error, you can render or redirect to anywhere from here!

即：

#[\S]+-代表自己，
#-非空白字符的非空序列。

我认为，您可以在不捕获小组的情况下完成这项工作。

另一个提示是在模式中使用[\S]+时要非常小心，因为可能匹配得太少或太多。

在您的第一个示例（ withscan ）中，模式“ consumes”尾随.* 当前行的其余部分（最多一个换行符（不包括），以点表示）与换行符不匹配）。因此，如果此行的其余部分包含另一个“＃...”片段，则没有机会与您的捕获小组相匹配。

要捕获所有{strong> 个.*字符串，通常应将 global 选项传递给正则表达式处理器，但也许 DataWeave 使用此选项默认情况下为选项（我不懂此语言）。

还请参见https://regex101.com/r/NPiMok/1上的工作示例（一个方便的正则表达式测试站点）。

DataWeave 2.0匹配所有出现的正则表达式

2 个答案: