正则表达式,用于捕获和替换元素textContent

时间:2019-05-20 21:21:55

标签: regex java-8

在两个示例中,我都希望替换“名称”节点的值。我使用正则表达式组进行匹配并替换。分组有效,但替换无效。

input 1
<xml
   <user:address>.../</user:address>
   <user:name>foo</user:name>
</xml>

input 2

<xml
   <user:address>.../</user:address>
   <street:name>bar</street:name>
</xml>


private static final String NAME_GROUP = "name";
public static final Pattern pattern = Pattern.compile("<.*:name>" + "(?<" + NAME + ">.*)</.*:name>");

final Matcher nameMatcher = pattern.matcher(str);
final String s = nameMatcher.find() ? nameMatcher.group(NAME_GROUP) : null;
System.out.println(s);

//foo
//bar

现在当我替换

String output = nameMatcher.replaceFirst("hello")
 I get 
 hello</xml>

虽然我期望以下

<xml
       <user:address>.../</user:address>
       <user:name>hello</user:name>
    </xml>

对于两个示例。为什么组可以工作但不能替代?

3 个答案:

答案 0 :(得分:2)

假设这只是一个示例,并且您没有尝试使用正则表达式解析XML,则可以使用这种方法。在这里,我们在单独的捕获组中匹配并捕获字符串前和字符串后。在替换中,我们使用这些组的反向引用将字符串前和字符串后放回最终输出中。

TansformProcess

请注意,对于这种特定情况,可以使用以下较短的代码:

final String str = "<xml\n" + 
        "   <name>bar</name>\n" + 
        "   <user:address>.../</user:address>\n" + 
        "   <user:name>foo</user:name>\n" + 
        "</xml>";

final String NAME_GROUP = "name";
final Pattern pattern = Pattern.compile("(<(?:[^:]+:)?name>)(?<" + NAME_GROUP + ">.*?)(</(?:[^:]+:)?name>)");
final Matcher m = pattern.matcher(str);

StringBuilder sb = new StringBuilder();
while (m.find()) {
     m.appendReplacement( sb, m.group(1) + "hello" + m.group(3) );
}
m.appendTail(sb);

System.out.println(sb);

输出:

final Pattern pattern = Pattern.compile("(<(?:[^:]+:)?name>)>.*?(</(?:[^:]+:)?name>)");
final Matcher m = pattern.matcher(str);

String repl = m.replaceAll("$1hello$2");

System.out.println(repl);

答案 1 :(得分:1)

我的猜测是,在这里我们想用一些新名称替换name元素。一种方法是,我们创建三个捕获组,一个作为打开标记的左边界,一个作为我们要替换的所需输出的标记,而第三个作为结束标记的标记:

(<.+?:name>)(.+?)(<\/.+?:name>)

Demo

enter image description here

RegEx

如果不需要此表达式,可以在regex101.com中对其进行修改或更改。

RegEx电路

jex.im还有助于可视化表达式。

enter image description here

测试

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String regex = "(<.+?:name>)(.+?)(<\\/.+?:name>)";
final String string = "<xml\n"
     + "   <user:address>.../</user:address>\n"
     + "   <user:name>foo</user:name>\n"
     + "</xml>\n"
     + "<xml\n"
     + "   <user:address>.../</user:address>\n"
     + "   <street:name>bar</street:name>\n"
     + "</xml>\n"
     + "<xml\n"
     + "       <user:address>.../</user:address>\n"
     + "       <user:name>hello</user:name>\n"
     + "    </xml>";
final String subst = "\\1Any New Name You Wish Goes Here\\3";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);

System.out.println("Substitution result: " + result);

编辑:

如果我们希望拥有<name></name>标签,则可以更新表达式并将标签的第一部分设为可选:

(<(.+?:)?name>)(.+?)(<\/(.+?:)?name>)

enter image description here

DEMO

答案 2 :(得分:1)

replaceFirstreplaceAll中的操作String / Matcher将始终替换整个匹配项。他们归结为类似

的实现
public static String replace(
    CharSequence source, Pattern p, String replacement, boolean all) {

    Matcher m = p.matcher(source);
    if(!m.find()) return source.toString();
    StringBuffer sb = new StringBuffer();
    do m.appendReplacement(sb, replacement); while(all && m.find());
    return m.appendTail(sb).toString();
}

请注意,在Java 9之前,我们必须在这里使用StringBuffer而不是StringBuilder

当我们忽略在替换字符串中包含组引用的功能时,我们可以在逻辑中更深入一层并获取

public static String replaceLiteral(
    CharSequence source, Pattern p, String replacement, boolean all) {

    Matcher m = p.matcher(source);
    if(!m.find()) return source.toString();
    StringBuilder sb = new StringBuilder();
    int lastEnd = 0;
    do {
        sb.append(source, lastEnd, m.start()).append(replacement);
        lastEnd = m.end();
    } while(all && m.find());
    return sb.append(source, lastEnd, source.length()).toString();
}

对于此代码,更改逻辑以替换特定的命名组而不是整个匹配很容易:

public static String replaceGroupWithLiteral(
    CharSequence source, Pattern p, String groupName, String replacement, boolean all) {

    Matcher m = p.matcher(source);
    if(!m.find()) return source.toString();
    StringBuilder sb = new StringBuilder();
    int lastEnd = 0;
    do {
        sb.append(source, lastEnd, m.start(groupName)).append(replacement);
        lastEnd = m.end(groupName);
    } while(all && m.find());
    return sb.append(source, lastEnd, source.length()).toString();
}

这已经足以实现您的示例:

private static final String NAME_GROUP = "name";
public static final Pattern pattern
    = Pattern.compile("<.*:name>" + "(?<" + NAME_GROUP + ">.*)</.*:name>");
String input =
    "<xml\n"
  + "   <user:address>.../</user:address>\n"
  + "   <user:name>foo</user:name>\n"
  + "</xml>\n";
String s = replaceGroupWithLiteral(input, pattern, NAME_GROUP, "hello", false);
System.out.println(s);
<xml
   <user:address>.../</user:address>
   <user:name>hello</user:name>
</xml>

尽管我可能会使用

public static final Pattern pattern
    = Pattern.compile("<([^<>:]*?:name)>" + "(?<" + NAME_GROUP + ">.*)</\\1>");

如上所述(并通过方法名称明确指出),这与普通的regex替换操作不同,因为它将始终按字面意义插入替换。要获得与原型相同的行为,就需要更复杂,效率更低的代码,因此,仅在确实需要引用组时才使用它(否则该语法应被视为合同的替代语法)。

public static String replaceGroup(
    CharSequence source, Pattern p, String groupName, String replacement, boolean all) {

    Matcher m = p.matcher(source);
    if(!m.find()) return source.toString();
    StringBuffer sb = new StringBuffer();
    do {
        int s = m.start(), gs = m.start(groupName), e = m.end(), ge = m.end(groupName);
        String prefix = s == gs? "":
            Matcher.quoteReplacement(source.subSequence(s, gs).toString());
        String suffix = e == ge? "":
            Matcher.quoteReplacement(source.subSequence(ge, e).toString());
        m.appendReplacement(sb, prefix+replacement+suffix);
    } while(all && m.find());
    return m.appendTail(sb).toString();
}

以此为例,如果我们使用

String s = replaceGroup(input, pattern, NAME_GROUP, "[[${"+NAME_GROUP+"}]]", false);

我们得到

<xml
   <user:address>.../</user:address>
   <user:name>[[foo]]</user:name>
</xml>