我有一个包含以下格式记录的文件:
1285957838.880 1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css
其中有11个字段([02/Oct/2010:00:00:38 +0530]
是单个字段)
我想写提取字段说7,8,9。是否可以使用Java正则表达式提取这些字段。
可以使用正则表达式来匹配上面的多个模式吗?
从上面的记录中,我需要提取字段
f1: http://www.google.com/tools/dlpage/res/c/css/dlpage.css
f2: 02/Oct/2010:00:00:38 +0530
f3: je02121
答案 0 :(得分:14)
按顺序执行,而不是全部在一个模式中(如果你有很多这样的行,首先拆分行,也将编译后的Pattern提取为常量):
String input = "1285957838.880 1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
Matcher matcher = Pattern.compile("\\[.*?\\]|\\S+").matcher(input);
int nr = 0;
while (matcher.find()) {
System.out.println("Match no. " + ++nr + ": '" + matcher.group() + "'");
}
<强>输出:强>
Match no. 1: '1285957838.880'
Match no. 2: '1'
Match no. 3: '192.168.10.228'
Match no. 4: 'TCP_HIT/200'
Match no. 5: '1434'
Match no. 6: 'GET'
Match no. 7: 'http://www.google.com/tools/dlpage/res/c/css/dlpage.css'
Match no. 8: '[02/Oct/2010:00:00:38 +0530]'
Match no. 9: 'je02121'
Match no. 10: 'NONE/-'
Match no. 11: 'text/css'
正则表达式解释:
\\[ match an opening square brace
.*? and anything up to a
\\] closing square brace
| or
\\S+ any sequence of multiple non-whitespace characters
答案 1 :(得分:5)
假设字段中唯一允许空格的位置在日期字段的括号之间,并且没有空字段,则可以使用:
Pattern regex = Pattern.compile(
"^(?:\\S+\\s+){6} # first 6 fields\n" +
"(\\S+)\\s+ # field 7\n" +
"\\[([^]]+)\\]\\s+ # field 8\n" +
"(\\S+) # field 9",
Pattern.MULTILINE | Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
答案 2 :(得分:1)
使用正则表达式拆分“[\ t \ s] +?”并将结果存储在数组中,例如s。
然后s [6],s [7] + s [8]和s [9]将是预期的结果
答案 3 :(得分:0)
此选项不包括输出
中的左右括号([]) String input = "1285957838.880 1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
Matcher matcher = Pattern.compile("(\\d+/+\\w+/+\\d.* \\+\\d+)|([^\\[]\\S+[^\\]])").matcher(input);