解析Jetty日志记录

时间:2011-04-20 11:15:44

标签: java regex parsing logging

对于给定的输入示例:

70.80.110.200 -  -  [12/Apr/2011:05:47:34 +0000] "GET /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000 HTTP/1.1" 302 0 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; FunWebProducts; HotbarSearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; AskTbFWV5/5.11.3.15590)" 4 4

我想定义以下解析逻辑(可能是正则表达式)

  1. 提取IP(3位数,点)* 4 => 70.80.110.200
  2. 提取日期=> 12 /月/ 2011
  3. 提取时间=> 5时47分34秒
  4. 提取URI(以\“开头,以\结尾)。 => /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000

4 个答案:

答案 0 :(得分:3)

尝试:

/^([0-9.]+).*?\[(\d+\/\w+\/\d+):(\d+:\d+:\d+).*?\].*?(\/[^ ]*).*$/

如您所料,在以下组(1,2,3,4)中,您将获得指定的所有数据 - 例如.group(3)是时间。

答案 1 :(得分:2)

确保将Jetty配置为执行与NSCA兼容的日志记录,然后您可以使用任何NCSA日志分析器来分析日志。

如果你想手工完成,那么这是正则表达式的一个很好的用例。

答案 2 :(得分:1)

完整的代码示例(基于hsz's answer):

import java.util.*;
import java.util.regex.*;

public class RegexDemo {

  public static void main( String[] argv ) {
    String pat = "^([0-9.]*).*?\\[(\\d+\\/\\w+\\/\\d+):(\\d+:\\d+:\\d+).*?\\].*?(\\/[^ ]*).*$";
    Pattern p = Pattern.compile(pat);
    String target = "70.80.110.200 -  -  [12/Apr/2011:05:47:34 +0000] \"GET /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000 HTTP/1.1\" 302 0 \"-\" \"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; FunWebProducts; HotbarSearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; AskTbFWV5/5.11.3.15590)\" 4 4";
    Matcher m = p.matcher(target);
    System.out.println("pattern: " + pat);
    System.out.println("target: " + target);

    if (m.matches()) {
      System.out.println("found");
      for (int i=0; i <= m.groupCount(); ++i) {
        System.out.println(m.group(i));
      }
    }
  }
}

答案 3 :(得分:0)

您可以尝试以下操作:

String s = "70.80.110.200 -  -  [12/Apr/2011:05:47:34 +0000] \"GET /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000 HTTP/1.1\" 302 0 \"-\" \"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; FunWebProducts; HotbarSearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; AskTbFWV5/5.11.3.15590)\" 4 4";
Pattern p = Pattern.compile("^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).*?\\" + //ip
                            "[([^:]*):"+ //date
                            "(\\d{2}:\\d{2}:\\d{2}).*?\\].*?"+ //time
                            "(/[^\\s]*).*$"); //uri

Matcher m = p.matcher(s);
if(m.find()){
    String ip = m.group(1);
    String date = m.group(2);
    String time = m.group(3);
    String uri = m.group(4);
}