我在日志文件中有这样的行,但我的正则表达式有问题。
127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" "http://127.0.0.1:8080/CRUDProject/StudentController.do"
这是我在Netbeans项目中的代码:
public class LogRegExp1 {
public static void main(String argv[]) {
FileReader myFile = null;
BufferedReader buff = null;
String logEntryPattern = "^([\\d.]+|[\\d:]+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) ([\\d]+) [a-zA-Z0-9_ ]*(\\S+) [-]?[ ]?\\[([\\w:/] +\\s[+\\-]\\d{4})\\] \\\"(.+?)\\\" (\\d{3}) (\\S+) ([\\d]+) (\\S+) \"(.+?)\\\" \"(.+?)\\\"";
System.out.println("Using RE Pattern:");
System.out.println(logEntryPattern);
Pattern p = Pattern.compile(logEntryPattern);
try {
myFile = new FileReader("e3600_access_log2016-05-24.log");
buff = new BufferedReader(myFile);
while (true) {
String line = buff.readLine();
if (line == null) {
break;
}
Matcher matcher = p.matcher(line);
System.out.println("groups: " + matcher.groupCount());
if (!matcher.matches()) {
System.err.println(line + matcher.toString());
return;
}
System.out.println("%a Remote IP Address : " + matcher.group(1));}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
buff.close();
myFile.close();
} catch (IOException e) {
e.printStackTrace();
}}}}`
结果我得到了这个:
Using RE Pattern:
^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/] +\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\"
groups: 17
127.0.0.1 192.168.1.66 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" "http://127.0.0.1:8080/CRUDProject/StudentController.do"java.util.regex.Matcher[pattern=^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/] +\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\" region=0,427 lastmatch=]`
所有的帮助都是关于如何以及我做错了什么,并且应该修复所以我可以得到我应该得到的结果。 感谢
答案 0 :(得分:0)
您的模式与日志条目不匹配。使用http://regexr.com/之类的工具来调试正则表达式。
此修改后的模式与您的示例输入匹配:
^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/]+ [+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\"
这可能无法解决你所有的问题,它看起来仍然很脆弱。再测试一下并调整模式。
答案 1 :(得分:0)
此正则表达式将执行以下操作:
注意:要在java中使用此正则表达式,您需要将所有\
替换为\\
。我还在自己的行上留下了匹配每个子字符串的表达式。如果以此格式使用此表达式,则需要包含“忽略空格”标志,或者只需将表达式设为一行。请记住,此表达式不会对日期或IP地址子字符串进行详尽的验证。
^
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
([0-9]+)\s+
([0-9]+)\s+
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
-\s+
([a-z]+\s[0-9]+)\s+
(\?[^\s]+)\s+
-\s+
\[([0-9]{1,2}\/(?:Jan|feb|Mar|apr|may|Jun|July|Aug|Sep|Oct|Nov|Dec)\/[0-9]{4}(?::[0-9]{2}){3}\s+\+[0-9]{4})\]\s+
"([^"]+)"\s+
([0-9]+)\s+
([^\s]+)\s+
([0-9]+)\s+
([0-9a-f]+)\s+
"([^"]+)"\s+
"([^"]+)"
要更好地查看图像,您可以右键单击图像并选择在新窗口中打开。
现场演示
https://regex101.com/r/mX7gG2/1
示例文字
127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080?action = edit& studentId = 1 - [24 / May / 2016:19:33:52 +0300]" GET / CRUDProject / StudentController .do?action = edit& studentId = 1 HTTP / 1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226" Mozilla / 5.0(Windows NT 6.1; WOW64)AppleWebKit / 537.36(KHTML,与Gecko一样)Chrome / 50.0.2661.102 Safari / 537.36" " http://127.0.0.1:8080/CRUDProject/StudentController.do"
样本匹配
[0][0] = 127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" "http://127.0.0.1:8080/CRUDProject/StudentController.do"
[0][1] = 127.0.0.1
[0][2] = 192.168.1.1
[0][3] = 1050
[0][4] = 1050
[0][5] = 127.0.0.1
[0][6] = GET 8080
[0][7] = ?action=edit&studentId=1
[0][8] = 24/May/2016:19:33:52 +0300
[0][9] = GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1
[0][10] = 200
[0][11] = /CRUDProject/StudentController.do
[0][12] = 264
[0][13] = ABADDD8AFB03ECC4791D76E543290226
[0][14] = Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
[0][15] = http://127.0.0.1:8080/CRUDProject/StudentController.do
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture (3 times):
----------------------------------------------------------------------
[0-9]{1,3} any character of: '0' to '9' (between
1 and 3 times (matching the most
amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
){3} end of grouping
----------------------------------------------------------------------
[0-9]{1,3} any character of: '0' to '9' (between 1
and 3 times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
(?: group, but do not capture (3 times):
----------------------------------------------------------------------
[0-9]{1,3} any character of: '0' to '9' (between
1 and 3 times (matching the most
amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
){3} end of grouping
----------------------------------------------------------------------
[0-9]{1,3} any character of: '0' to '9' (between 1
and 3 times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \5:
----------------------------------------------------------------------
(?: group, but do not capture (3 times):
----------------------------------------------------------------------
[0-9]{1,3} any character of: '0' to '9' (between
1 and 3 times (matching the most
amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
){3} end of grouping
----------------------------------------------------------------------
[0-9]{1,3} any character of: '0' to '9' (between 1
and 3 times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \5
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
- '-'
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \6:
----------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \6
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \7:
----------------------------------------------------------------------
\? '?'
----------------------------------------------------------------------
[^\s]+ any character except: whitespace (\n,
\r, \t, \f, and " ") (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \7
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
- '-'
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
( group and capture to \8:
----------------------------------------------------------------------
[0-9]{1,2} any character of: '0' to '9' (between 1
and 2 times (matching the most amount
possible))
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
Jan 'Jan'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
feb 'feb'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
Mar 'Mar'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
apr 'apr'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
may 'may'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
Jun 'Jun'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
July 'July'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
Aug 'Aug'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
Sep 'Sep'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
Oct 'Oct'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
Nov 'Nov'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
Dec 'Dec'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
[0-9]{4} any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
(?: group, but do not capture (3 times):
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
[0-9]{2} any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
){3} end of grouping
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
----------------------------------------------------------------------
\+ '+'
----------------------------------------------------------------------
[0-9]{4} any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
) end of \8
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
( group and capture to \9:
----------------------------------------------------------------------
[^"]+ any character except: '"' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \9
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \10:
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \10
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \11:
----------------------------------------------------------------------
[^\s]+ any character except: whitespace (\n,
\r, \t, \f, and " ") (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \11
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \12:
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \12
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \13:
----------------------------------------------------------------------
[0-9a-f]+ any character of: '0' to '9', 'a' to 'f'
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \13
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
( group and capture to \14:
----------------------------------------------------------------------
[^"]+ any character except: '"' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \14
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
( group and capture to \15:
----------------------------------------------------------------------
[^"]+ any character except: '"' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \15
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------