Question

我整天都在反对这种正常的表达方式。

任务看起来很简单，我有许多XML标签名称，我必须替换（掩盖）他们的内容。

例如

<Exony_Credit_Card_ID>242394798</Exony_Credit_Card_ID>

必须成为

<Exony_Credit_Card_ID>filtered</Exony_Credit_Card_ID>

有多个此类标签具有不同的名称

如何匹配内部的任何文本但不匹配标记本身？

编辑：我应该再次澄清一下。分组然后使用该组以避免替换内部文本在我的情况下不起作用，因为当我将其他标签添加到表达式时，后续匹配的组号不同。例如：

"(<Exony_Credit_Card_ID>).+(</Exony_Credit_Card_ID>)|(<Billing_Postcode>).+(</Billing_Postcode>)"

使用字符串"$1filtered$2"

replaceAll 不起作用，因为当正则表达式与Billing_Postcode匹配时，其组为3和4而不是1和2

Answer 1

String resultString = subjectString.replaceAll(
    "(?x)    # (multiline regex): Match...\n" +
    "<(Exony_Credit_Card_ID|Billing_Postcode)> # one of these opening tags\n" +
    "[^<>]*  # Match whatever is contained within\n" +
    "</\\1>  # Match corresponding closing tag",
    "<$1>filtered</$1>");

Answer 2

在你的情况下，我会用它：

(?<=<(Exony_Credit_Card_ID|tag1|tag2)>)(\\d+)(?=</(Exony_Credit_Card_ID|tag1|tag2)>)

然后将匹配项替换为filtered，因为标记会从返回的匹配项中排除。由于您的目标是隐藏敏感数据，因此最好是安全并使用“激进”匹配，尝试匹配尽可能多的敏感数据，即使有时并非如此。

如果数据包含其他字符，如空格，斜线，短划线等，则可能需要调整标记内容匹配器（\\d+）。

Answer 3

我没有调试过这段代码，但你应该使用这样的代码：

Pattern p = Pattern.compile("<\\w+>([^<]*)<\\w+>");
Matcher m = p.matcher(str);
if (m.find()) {
    String tagContent = m.group(1);
}

我希望这是一个好的开始。

Answer 4

我会用这样的东西：

private static final Pattern PAT = Pattern.compile("<(\\w+)>(.*?)</\\1>");

private static String replace(String s, Set<String> toReplace) {
    Matcher m = PAT.matcher(s);
    if (m.matches() && toReplace.contains(m.group(1))) {
        return '<' + m.group(1) + '>' + "filtered" + "</" + m.group(1) + '>';
    }
    return s;
}

Answer 5

我知道你说依赖于团体号码并不适用于你的情况......但我真的不知道怎么做。你能不能使用那种东西：

xmlString.replaceAll("<(Exony_Credit_Card_ID|tag2|tag3)>([^<]+)</(\\1)>", "<$1>filtered</$1>");

？这适用于我用作测试的基本样本。

编辑只是为了分解：

"<(Exony_Credit_Card_ID|tag2|tag3)>" + // matches the tag itself
"([^<]+)" + // then anything in between the opening and closing of the tag
"</(\\1)>" // and finally the end tag corresponding to what we matched as the first group (Exony_Credit_Card_ID, tag1 or tag2)

"<$1>" + // Replace using the first captured group (tag name)
"filtered" + // the "filtered" text
"</$1>" // and the closing tag corresponding to the first captured group

正则表达式，匹配特定XML标记的内容，但没有标记本身

5 个答案: