Question

我正在尝试为Java中的replaceAll方法创建正则表达式。测试字符串为abXYabcXYZ，模式为abc。我想用+替换除模式之外的任何符号。例如，字符串abXYabcXYZ和模式[^(abc)]应返回++++abc+++，但在我的情况下，它会返回ab++abc+++。

public static String plusOut(String str, String pattern) {
    pattern= "[^("+pattern+")]" + "".toLowerCase();
    return str.toLowerCase().replaceAll(pattern, "+");
}
public static void main(String[] args) {
    String text = "abXYabcXYZ";
    String pattern = "abc";
    System.out.println(plusOut(text, pattern));
}

当我尝试用+替换模式时没有问题 - 模式abXYabcXYZ的{{1}}返回(abc)。模式abxy+xyz返回字符串而不替换。

还有其他方法可以将NOT（正则表达式）或组符号写成单词吗？

Answer 1

你想要实现的是正则表达式非常困难，因为没有办法表达“替换不匹配模式的字符串”。您将不得不使用“积极”模式，告诉匹配什么而不是不匹配。

此外，您希望将每个字符替换为替换字符，因此您必须确保您的模式恰好与一个字符匹配。否则，您将用一个字符替换整个字符串，返回一个较短的字符串。

对于您的玩具示例，您可以使用负向前瞻和后视来完成任务，但对于具有更长或更复杂字符串的实际示例，这可能更难，因为您必须分别考虑字符串的每个字符，以及它的背景。

以下是“not'abc'”的模式：

[^abc]|a(?!bc)|(?<!a)b|b(?!c)|(?<!ab)c

它由五个子模式组成，用“或”（|）连接，每个模式只匹配一个字符：

[^abc]匹配除a，b或c
a(?!bc)匹配a如果后面没有bc
(?<!a)b匹配b如果前面没有a
b(?!c)匹配b如果后面没有c
(?<!ab)c匹配c如果前面没有ab

我们的想法是匹配目标词abc中不存在的每个字符，以及根据上下文不属于您单词的每个字符。可以使用否定前瞻(?!...)和lookbehinds (?<!...)来检查上下文。

你可以想象，一旦你有一个包含一个角色的目标词，比如example，这种技术就会失败。很难表达“匹配e如果未跟x ，后面没有l”。

特别是对于动态模式，更容易进行正面搜索，然后替换第二遍中不匹配的每个字符，正如其他人在此处所建议的那样。

Answer 2

[^ ...]将匹配一个不是......的任何一个字符。

所以你的模式“[^（abc）]”是说“匹配一个不是a，b，c或左或右括号的字符”;事实上，这就是你的考试中发生的事情。

很难说“在一个简单的正则表达式中替换不属于字符串'abc'的所有字符”。你可能会做什么来实现你想要的东西可能是一些讨厌的事情，如

while the input string still contains "abc"
   find the next occurrence of "abc"
   append to the output a string containing as many "+"s as there are characters before the "abc"
   append "abc" to the output string
   skip, in the input string, to a position just after the "abc" found
append to the output a string containing as many "+"s as there are characters left in the input

或者如果输入字母表受限制，您可以使用正则表达式来执行类似

的操作

replace all occurrences of "abc" with a single character that does not occur anywhere in the existing string
replace all other characters with "+"
replace all occurrences of the target character with "abc"

哪个更具可读性但可能效果不佳

Answer 3

否定正则表达式通常很麻烦。我想你可能想要使用负向前瞻。这样的事情可能有用：

String pattern = "(?<!ab).(?!abc)";

我没有测试它，所以它可能不适用于退化情况。而且表现也可能太糟糕了。使用多步算法可能更好。

编辑：不，我认为这不适用于所有情况。你可能会花更多的时间来调试这样的正则表达式，而不是用一些额外的代码在算法上进行调试。

Answer 4

尝试在没有正则表达式的情况下解决它：

String out = "";
int i;
for(i=0; i<text.length() - pattern.length() + 1; ) {
    if (text.substring(i, i + pattern.length()).equals(pattern)) {
        out += pattern;
        i += pattern.length();
    }
    else {
        out += "+";
        i++;
    }
}
for(; i<text.length(); i++) {
    out += "+";
}

Answer 5

而不是单一的replaceAll，你可以尝试类似的东西：

   @Test
    public void testString() {
        final String in = "abXYabcXYabcHIH";
        final String expected = "xxxxabcxxabcxxx";
        String result = replaceUnwanted(in);
        assertEquals(expected, result);
    }

    private String replaceUnwanted(final String in) {
        final Pattern p = Pattern.compile("(.*?)(abc)([^a]*)");
        final Matcher m = p.matcher(in);
        final StringBuilder out = new StringBuilder();
        while (m.find()) {
            out.append(m.group(1).replaceAll(".", "x"));
            out.append(m.group(2));
            out.append(m.group(3).replaceAll(".", "x"));
        }
        return out.toString();
    }

Answer 6

我没有使用replaceAll(...)，而是采用Pattern/Matcher方法：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static String plusOut(String str, String pattern) {
        StringBuilder builder = new StringBuilder();
        String regex = String.format("((?:(?!%s).)++)|%s", pattern, pattern);
        Matcher m = Pattern.compile(regex).matcher(str.toLowerCase());
        while(m.find()) {
            builder.append(m.group(1) == null ? pattern : m.group().replaceAll(".", "+"));
        }
        return builder.toString();
    }

    public static void main(String[] args) {
        String text = "abXYabcXYZ";
        String pattern = "abc";
        System.out.println(plusOut(text, pattern));
    }

}

请注意，如果Pattern.quote(...)包含正则表达式元字符，则需要使用String pattern。

修改：我没有看到toolkit已经建议采用Pattern/Matcher方法（虽然略有不同）......

Java中的正则表达式问题

6 个答案: