如何通过RegEx或replaceAll删除包含特殊字符的部分字符串?

时间:2016-04-26 17:09:12

标签: java regex

以下是字符串:

1. "AAA BBB  CCCCC CCCCCCC"
2. "  AAA              BBB  DDDD DDDD DDDDD"
3. "    EEE         FFF  GGGGG GGGGG"

开头和第一个和第二个单词之间的空格可以变化。 所以我需要一个RegEx来删除第三个单词之前的所有内容,所以它总是返回 " CCCCC CCCCCCC"或者" DDDD DDDD DDDDD"或者" GGGGG GGGGG"。 假设它可以通过RegEx完成,而不是解析字符串中的所有单词

3 个答案:

答案 0 :(得分:1)

您需要使用组匹配来解析所需的数据

String result = null;

try {
    Pattern regex = Pattern.compile("\\s*\\w+\\s*\\w+\\s*([\\w| ]+)");
    Matcher regexMatcher = regex.matcher("  AAA              BBB  DDDD DDDD DDDDD");
    if (regexMatcher.find()) {
        result = regexMatcher.group(1); // result = "DDDD DDDD DDDDD"
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

正则表达式解释

"\\s" +           // Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   "*" +            // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"\\w" +           // Match a single character that is a “word character” (letters, digits, and underscores)
   "+" +            // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"\\s" +           // Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   "*" +            // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"\\w" +           // Match a single character that is a “word character” (letters, digits, and underscores)
   "+" +            // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"\\s" +           // Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   "*" +            // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"(" +            // Match the regular expression below and capture its match into backreference number 1
   "[\\w| ]" +       // Match a single character present in the list below
                       // A word character (letters, digits, and underscores)
                       // One of the characters “| ”
      "+" +            // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
")" 

答案 1 :(得分:1)

这个正则表达式将起作用

\s*\w+\s+\w+\s+(.+$)

<强> Regex Demo

JAVA代码

String pattern  = "(?m)\\s*\\w+\\s+\\w+\\s+(.+$)"; 
String line = "AAA BBB  CCCCC CCCCCCC\n  AAA              BBB  DDDD DDDD DDDDD\n    EEE         FFF  GGGGG GGGGG";

Pattern r = Pattern.compile(pattern);

Matcher m = r.matcher(line);
while (m.find()) {
     System.out.println("Found value: " + m.group(1) );
}

<强> Ideone Demo

答案 2 :(得分:1)

与@ rock321987的答案类似,您可以修改正则表达式以使用量词来忽略您不想要的任何数量的前面单词。

\s*(?:\w+\s+){2}(.+$)

More info

或者在Java中:

"\\s*(?:\\w+\\s+){2}(.+$)"

?:使()中的模式成为非捕获组。 {}中的数字是您要忽略的空格后面的单词数。