复杂的正则表达式

时间:2011-01-20 10:25:15

标签: java regex

我想在我的java程序中使用正则表达式来识别我的字符串的一些功能。 我有这种类型的字符串:

  

`-Author-写了(-hh - : - mm - )

所以,例如,我有一个字符串:

  

Cecco写过(15:12)

我要提取作者,hh和mm字段。显然我有一些限制需要考虑:

hh and mm must be numbers

author hasn't any restrictions

I've to consider space between "has wrote" and (

我不知道如何使用正则表达式,你能帮助我吗?

编辑:我附上我的代码:

            String mRegex = "(\\s)+ has wrote \\((\\d\\d):(\\d\\d)\\)";
            Pattern mPattern = Pattern.compile(mRegex);

            String[] str = {
                "Cecco CQ has wrote (14:55)", //OK (matched)
                "yesterday you has wrote that I'm crazy", //NO (different text)
                "Simon has wrote (yesterday)", // NO (yesterday isn't numbers)
                "John has wrote (22:32)", //OK
                "James has wrote(22:11)", //NO (missed space between has wrote and ()
                "Tommy has wrote (xx:ss)" //NO (xx and ss aren't numbers)
            };

            for(String s : str) {
                Matcher mMatcher = mPattern.matcher(s);
                while (mMatcher.find()) {
                    System.out.println(mMatcher.group());
                }
            }

3 个答案:

答案 0 :(得分:2)

功课?

类似的东西:

(.+) has wrote \((\d\d):(\d\d)\)

应该做的伎俩

  • () - 标记要捕获的组(上面有三个)
  • .+ - 任何字符(你说没有限制)
  • \d - 任意数字
  • \(\)以文字而不是捕获组的形式逃脱了parens

使用:

Pattern p = Pattern.compile("(.+) has wrote \\((\\d\\d):(\\d\\d)\\)");

Matcher m = p.matcher("Gareth has wrote (12:00)");

if( m.matches()){
    System.out.println(m.group(1));
    System.out.println(m.group(2));
    System.out.println(m.group(3));
}

要在最后处理一个可选的(HH:mm)你需要开始使用一些黑暗的正则表达巫术:

Pattern p = Pattern.compile("(.+) has wrote\\s?(?:\\((\\d\\d):(\\d\\d)\\))?");

Matcher m = p.matcher("Gareth has wrote (12:00)");

if( m.matches()){
    System.out.println(m.group(1));
    System.out.println(m.group(2));
    System.out.println(m.group(3));
}

m = p.matcher("Gareth has wrote");
if( m.matches()){       
    System.out.println(m.group(1));
    // m.group(2) == null since it didn't match anything
}

新的非转义模式:

(.+) has wrote\s?(?:\((\d\d):(\d\d)\))?
  • \s?可选地匹配空格(如果没有(HH:mm)组,则最后可能没有空格
  • (?: ... )是一个无捕获组,即允许使用?之后将其设为可选

我认为@codinghorror有something to say about regex

答案 1 :(得分:1)

找出正则表达式的最简单方法是在编码之前使用测试工具 我使用来自http://www.brosinski.com/regex/

的eclipse插件

使用这个我想出了以下结果:

([a-zA-Z]*) has wrote \((\d\d):(\d\d)\)
Cecco has wrote (15:12)

Found 1 match(es):

start=0, end=23
Group(0) = Cecco has wrote (15:12)
Group(1) = Cecco
Group(2) = 15
Group(3) = 12

可以在http://www.regular-expressions.info/tutorial.html

找到优秀的正则表达式语法

答案 2 :(得分:0)

好吧,以防你不知道,Matcher有一个很好的函数可以绘制出特定的组,或者由(),Matcher.group(int)包围的模式的一部分。就像我想匹配两个分号之间的数字,如:

<强>:22:

我可以使用正则表达式":(\\d+):"匹配两个分号之间的一个或多个数字,然后我可以使用以下内容专门获取数字:

Matcher.group(1)

然后只需要将String解析为int。请注意,组编号从 1 开始。 group(0)是整个匹配,因此上一个示例的Matcher.group(0)将返回:22:

对于您的情况,我认为您需要考虑的正则表达式位是

  • "[A-Za-z]"表示字母字符(您可以安全地使用"\\w",它会匹配字母字符,以及数字和_)。
  • "\\d"代表数字(1,2,3 ...)
  • "+"表示您想要一个或多个前一个字符或组。