一次替换多个子串

时间:2011-10-05 12:43:52

标签: java regex replace

假设我有一个文件,其中包含一些文字。其中有子字符串,如“substr1”,“substr2”,“substr3”等。我需要用其他一些文本替换所有这些子串,例如“repl1”,“repl2”,“repl3”。在Python中,我会创建一个这样的字典:

{
 "substr1": "repl1",
 "substr2": "repl2",
 "substr3": "repl3"
}

并创建用'|'连接键的模式,然后用re.sub函数替换。 在Java中是否有类似的简单方法?

5 个答案:

答案 0 :(得分:14)

这就是你的Python建议转换为Java的方式:

Map<String, String> replacements = new HashMap<String, String>() {{
    put("substr1", "repl1");
    put("substr2", "repl2");
    put("substr3", "repl3");
}};

String input = "lorem substr1 ipsum substr2 dolor substr3 amet";

// create the pattern joining the keys with '|'
String regexp = "substr1|substr2|substr3";

StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(input);

while (m.find())
    m.appendReplacement(sb, replacements.get(m.group()));
m.appendTail(sb);


System.out.println(sb.toString());   // lorem repl1 ipsum repl2 dolor repl3 amet

这种方法可以替换同时(即“同时”)。即,如果你碰巧有

"a" -> "b"
"b" -> "c"

然后这种方法会提供"a b" -> "b c"而不是答案,表明您应该将多次调用链接到replacereplaceAll,这会给"c c"


(如果您推广这种方法以编程方式创建正则表达式,请确保Pattern.quote每个单独的搜索词和Matcher.quoteReplacement每个替换词。)

答案 1 :(得分:6)

StringUtils.replaceEach中的

Apache Commons Lang project,但它适用于字符串。

答案 2 :(得分:2)

yourString.replace("substr1", "repl1")
          .replace("substr2", "repl2")
          .replace("substr3", "repl3");

答案 3 :(得分:1)

首先,演示问题:

String s = "I have three cats and two dogs.";
s = s.replace("cats", "dogs")
    .replace("dogs", "budgies");
System.out.println(s);

这是为了取代cats =&gt;狗和狗=&gt; budgies,但顺序替换是对先前替换的结果进行操作,因此不幸的输出是:

  

我有三只虎皮鹦鹉和两只虎皮鹦鹉。

这是我实施的同步替换方法。使用String.regionMatches

编写起来很容易
public static String simultaneousReplace(String subject, String... pairs) {
    if (pairs.length % 2 != 0) throw new IllegalArgumentException(
        "Strings to find and replace are not paired.");
    StringBuilder sb = new StringBuilder();
    int numPairs = pairs.length / 2;
    outer:
    for (int i = 0; i < subject.length(); i++) {
        for (int j = 0; j < numPairs; j++) {
            String find = pairs[j * 2];
            if (subject.regionMatches(i, find, 0, find.length())) {
                sb.append(pairs[j * 2 + 1]);
                i += find.length() - 1;
                continue outer;
            }
        }
        sb.append(subject.charAt(i));
    }
    return sb.toString();
}

测试:

String s = "I have three cats and two dogs.";
s = simultaneousReplace(s,
    "cats", "dogs",
    "dogs", "budgies");
System.out.println(s);

输出:

  

我有三只狗和两只虎皮鹦鹉。

此外,在进行同步替换时,有时也很有用,以确保查找最长的匹配。 (例如,PHP的strtr函数执行此操作。)以下是我的实现:

public static String simultaneousReplaceLongest(String subject, String... pairs) {
    if (pairs.length % 2 != 0) throw new IllegalArgumentException(
        "Strings to find and replace are not paired.");
    StringBuilder sb = new StringBuilder();
    int numPairs = pairs.length / 2;
    for (int i = 0; i < subject.length(); i++) {
        int longestMatchIndex = -1;
        int longestMatchLength = -1;
        for (int j = 0; j < numPairs; j++) {
            String find = pairs[j * 2];
            if (subject.regionMatches(i, find, 0, find.length())) {
                if (find.length() > longestMatchLength) {
                    longestMatchIndex = j;
                    longestMatchLength = find.length();
                }
            }
        }
        if (longestMatchIndex >= 0) {
            sb.append(pairs[longestMatchIndex * 2 + 1]);
            i += longestMatchLength - 1;
        } else {
            sb.append(subject.charAt(i));
        }
    }
    return sb.toString();
}

你为什么需要这个?示例如下:

String truth = "Java is to JavaScript";
truth += " as " + simultaneousReplaceLongest(truth,
    "Java", "Ham",
    "JavaScript", "Hamster");
System.out.println(truth);

输出:

  

Java就像JavaScript一样,Ham就是Hamster

如果我们使用simultaneousReplace代替simultaneousReplaceLongest,则输出会有“HamScript”而不是“Hamster”:)

请注意,上述方法区分大小写。如果您需要不区分大小写的版本,则可以轻松修改上述内容,因为String.regionMatches可以使用ignoreCase参数。

答案 4 :(得分:-1)

    return yourString.replaceAll("substr1","relp1").
                     replaceAll("substr2","relp2").
                     replaceAll("substr3","relp3")