我有以下方法进行日志分离。日志格式与下面的格式完全相同,但值可能会更改
29-11-2013 19:18:53 192.2.2.22 66 192.2.2.22 8080 GET 402 103 103 HTTP/1.1 192.2.2.22 http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13
String regex = "^([0-9-]*)\\s([0-9:]*)\\s([0-9\\\\.]*)\\s([0-9]*|-)\\s([0-9\\\\.]*)\\s([0-9]*)\\s(GET|POST)\\s([0-9]*)\\s([0-9]*)\\s([0-9]*)\\s([a-zA-Z0-9\\\\./]*)\\s([a-zA-Z0-9:./]*)\\s(.*)\\s(.*)";
String pattern = "$1~~$2~~$3~~$4~~$5~~$6~~$7~~$8~~$9~~$10~~$11~~$12~~$13~~$14";
String values = "29-11-2013 19:18:53 192.2.2.22 66 192.2.2.22 8080 GET 402 103 103 HTTP/1.1 192.2.2.22 http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13";
List<Object> params = new ArrayList<Object>();
String formattedString = values.replaceAll(regex, pattern);
String[] fields = formattedString.split("~~");
for (String field : fields) {
params.add(field);
}
System.out.println(params);
没有正确拆分日志。
在网址之后:http://in.sample.com/parties/是问题所在。
Useragent由空格组成。所以log separartion没有按预期工作。
[29-11-2013, 19:18:53, 192.2.2.22, 66, 192.2.2.22, 8080, GET, 402, 103, 103, HTTP/1.1, 192.2.2.22, http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29, Safari/525.13]
[29-11-2013, 19:18:53, 192.2.2.22, 66, 192.2.2.22, 8080, GET, 402, 103, 103, HTTP/1.1, 192.2.2.22, http://in.sample.com/parties/, Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML like Gecko) Chrome/0.2.149.29 Safari/525.13]
任何帮助都会很棒。
答案 0 :(得分:1)
你不需要正则表达式来做到这一点。由于您的日志总是包含14个字段,并且由于问题空间位于最后一个字段中,因此您只需要将split方法与第二个参数(limit)一起使用:
String[] fields = values.split(" ", 14);
答案 1 :(得分:0)
我相信你错过了匹配HTTP/1.1
部分。试试这个正则表达式:
String regex = "(?i)^([0-9-]*)\\s([0-9:]*)\\s([0-9.]*)\\s([0-9]*|-)\\s([0-9.]*)\\s([0-9]*)\\s(GET|POST)\\s([0-9]*)\\s([0-9]*)\\s([0-9]*)\\s(HTTP\/1\.[01])\s([A-Z0-9./]*)\\s([A-Z0-9:./]*)\\s(.*)";
它给出了:
["29-11-2013 19:18:53 192.2.2.22 66 192.2.2.22 8080 GET 402 103 103 HTTP/1.1 192.2.2.22 http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13", "29-11-2013", "19:18:53", "192.2.2.22", "66", "192.2.2.22", "8080", "GET", "402", "103", "103", "HTTP/1.1", "192.2.2.22", "http://in.sample.com/parties/", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13"]
作为替代方案,您可以尝试找到&amp;使用专用的日志解析器。