RegEx忽略报价之间的文本

时间:2011-02-07 03:52:35

标签: java regex pattern-matching

我有一个正则表达式,[\\.|\\;|\\?|\\!][\\s]
这用于拆分字符串。但如果它在引号中,我不希望它分割. ; ? !

2 个答案:

答案 0 :(得分:6)

我不会使用split而是Pattern&相反,匹配。

演示:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {

        String text = "start. \"in quotes!\"; foo? \"more \\\" words\"; bar";

        String simpleToken = "[^.;?!\\s\"]+";

        String quotedToken =
                "(?x)             # enable inline comments and ignore white spaces in the regex         \n" +
                "\"               # match a double quote                                                \n" +
                "(                # open group 1                                                        \n" +
                "  \\\\.          #   match a backslash followed by any char (other than line breaks)   \n" +
                "  |              #   OR                                                                \n" +
                "  [^\\\\\r\n\"]  #   any character other than a backslash, line breaks or double quote \n" +
                ")                # close group 1                                                       \n" +
                "*                # repeat group 1 zero or more times                                   \n" +
                "\"               # match a double quote                                                \n";

        String regex = quotedToken + "|" + simpleToken;

        Matcher m = Pattern.compile(regex).matcher(text);

        while(m.find()) {
            System.out.println("> " + m.group());
        }
    }
}

产生:

> start
> "in quotes!"
> foo
> "more \" words"
> bar

如您所见,它还可以处理引用标记内的转义引号。

答案 1 :(得分:0)

这是为了忽略匹配中的引号而做的。

(?:[^\"\']|(?:\".*?\")|(?:\'.*?\'))*?    # <-- append the query you wanted to search for - don't use something greedy like .* in the rest of your regex.

为了适应你的正则表达式,你可以做到

(?:[^\"\']|(?:\".*?\")|(?:\'.*?\'))*?[.;?!]\s*