使用Java Regex匹配多个模式

时间:2011-05-11 12:30:29

标签: java regex

我有一个包含以下格式记录的文件:

1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css

其中有11个字段([02/Oct/2010:00:00:38 +0530]是单个字段)

我想写提取字段说7,8,9。是否可以使用Java正则表达式提取这些字段。

可以使用正则表达式来匹配上面的多个模式吗?

从上面的记录中,我需要提取字段

f1: http://www.google.com/tools/dlpage/res/c/css/dlpage.css  
f2: 02/Oct/2010:00:00:38 +0530  
f3: je02121

4 个答案:

答案 0 :(得分:14)

按顺序执行,而不是全部在一个模式中(如果你有很多这样的行,首先拆分行,也将编译后的Pattern提取为常量):

String input = "1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
Matcher matcher = Pattern.compile("\\[.*?\\]|\\S+").matcher(input);
int nr = 0;
while (matcher.find()) {
    System.out.println("Match no. " + ++nr + ": '" + matcher.group() + "'");
}

<强>输出:

Match no. 1: '1285957838.880'
Match no. 2: '1'
Match no. 3: '192.168.10.228'
Match no. 4: 'TCP_HIT/200'
Match no. 5: '1434'
Match no. 6: 'GET'
Match no. 7: 'http://www.google.com/tools/dlpage/res/c/css/dlpage.css'
Match no. 8: '[02/Oct/2010:00:00:38 +0530]'
Match no. 9: 'je02121'
Match no. 10: 'NONE/-'
Match no. 11: 'text/css'

正则表达式解释:

\\[    match an opening square brace
.*?    and anything up to a
\\]    closing square brace
|      or
\\S+   any sequence of multiple non-whitespace characters

答案 1 :(得分:5)

假设字段中唯一允许空格的位置在日期字段的括号之间,并且没有空字段,则可以使用:

Pattern regex = Pattern.compile(
    "^(?:\\S+\\s+){6}   # first 6 fields\n" +
    "(\\S+)\\s+         # field 7\n" +
    "\\[([^]]+)\\]\\s+  # field 8\n" +
    "(\\S+)             # field 9", 
    Pattern.MULTILINE | Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    for (int i = 1; i <= regexMatcher.groupCount(); i++) {
        // matched text: regexMatcher.group(i)
        // match start: regexMatcher.start(i)
        // match end: regexMatcher.end(i)
    }
} 

答案 2 :(得分:1)

使用正则表达式拆分“[\ t \ s] +?”并将结果存储在数组中,例如s。

然后s [6],s [7] + s [8]和s [9]将是预期的结果

答案 3 :(得分:0)

此选项不包括输出

中的左右括号([])
    String input = "1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
    Matcher matcher = Pattern.compile("(\\d+/+\\w+/+\\d.* \\+\\d+)|([^\\[]\\S+[^\\]])").matcher(input);