正则表达式在分隔符内提取字符串

时间:2013-01-03 13:50:48

标签: java regex

我试图在分隔符(本例中为括号)中提取字符串出现但不是引号内的字符串出现(单引号或双引号)。这是我试过的 - 这个正则表达式提取括号内的所有出现,也是引号内的那些(我不想要引号内的那些)

public class RegexMain {
    static final String PATTERN = "\\(([^)]+)\\)";
    static final Pattern CONTENT = Pattern.compile(PATTERN);
    /**
     * @param args
     */
    public static void main(String[] args) {
        String testString = "Rhyme (Jack) and (Jill) went up the hill on \"(Peter's)\" request.";
        Matcher match = CONTENT.matcher(testString);
        while(match.find()) {
            System.out.println(match.group()); // prints Jack, Jill and Peter's
        }
    }
}

3 个答案:

答案 0 :(得分:1)

你可以尝试

public class RegexMain {
    static final String PATTERN = "\\(([^)]+)\\)|\"[^\"]*\"";
    static final Pattern CONTENT = Pattern.compile(PATTERN);
    /**
     * @param args
     */
    public static void main(String[] args) {
        String testString = "Rhyme (Jack) and (Jill) went up the hill on \"(Peter's)\" request.";
        Matcher match = CONTENT.matcher(testString);
        while(match.find()) {
            if(match.group(1) != null) {
                System.out.println(match.group(1)); // prints Jack, Jill
            }
        }
    }
}

此模式将匹配带引号的字符串以及带括号的字符串,但只有带括号的字符串才会在group(1)中添加内容。由于+*在正则表达式中比较贪婪,因此您希望"(Peter's)"(Peter's)匹配。

答案 1 :(得分:1)

在这种情况下,您可以优雅地使用后视和前瞻操作员来实现您想要的效果。这是Python中的一个解决方案(我总是使用它在命令行上快速尝试),但Java代码中的正则表达式应该是相同的。

此正则表达式匹配使用正面后视的前括号前面的内容,并使用正向前瞻的右括号进行匹配。但是当开头括号前面带有使用负面后卫的单引号或双引号时,以及当使用负前瞻的单引号或双引号来结束右括号时,它会避免这些匹配。

In [1]: import re

In [2]: s = "Rhyme (Jack) and (Jill) went up the hill on \"(Peter's)\" request."

In [3]: re.findall(r"""
   ...:     (?<=               # start of positive look-behind
   ...:         (?<!           # start of negative look-behind
   ...:             [\"\']     # avoids matching opening parenthesis preceded by single or double quote
   ...:         )              # end of negative look-behind
   ...:         \(             # matches opening parenthesis
   ...:     )                  # end of positive look-behind
   ...:     \w+ (?: \'\w* )?   # matches whatever your content looks like (configure this yourself)             
   ...:     (?=                # start of positive look-ahead
   ...:         \)             # matches closing parenthesis 
   ...:         (?!            # start of negative look-ahead
   ...:             [\"\']     # avoids matching closing parenthesis succeeded by single or double quote
   ...:         )              # end of negative look-ahead  
   ...:     )                  # end of positive look-ahead
   ...:     """, 
   ...:     s, 
   ...:     flags=re.X)
Out[3]: ['Jack', 'Jill']

答案 2 :(得分:0)

注意:这不是最终的回复,因为我不熟悉JAVA,但我相信它仍然可以转换为JAVA语言。

就我而言,最简单的方法是用空字符串替换字符串中的引用部分,然后查找匹配项。希望你对PHP有点熟悉,这就是想法。

$str = "Rhyme (Jack) and (Jill) went up the hill on \" (Peter's)\" request.";

preg_match_all(
    $pat = '~(?<=\().*?(?=\))~',
    // anything inside parentheses
    preg_replace('~([\'"]).*?\1~','',$str),
    // this replaces quoted strings with ''
    $matches
    // and assigns the result into this variable
);
print_r($matches[0]);
// $matches[0] returns the matches in preg_match_all

// [0] => Jack
// [1] => Jill