使用正则表达式拆分日志文件

时间:2016-05-24 17:57:08

标签: java regex

我在日志文件中有这样的行,但我的正则表达式有问题。    127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" "http://127.0.0.1:8080/CRUDProject/StudentController.do"

这是我在Netbeans项目中的代码:

public class LogRegExp1 {

public static void main(String argv[]) {
    FileReader myFile = null;
    BufferedReader buff = null;

    String logEntryPattern = "^([\\d.]+|[\\d:]+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) ([\\d]+) [a-zA-Z0-9_ ]*(\\S+) [-]?[ ]?\\[([\\w:/] +\\s[+\\-]\\d{4})\\] \\\"(.+?)\\\" (\\d{3}) (\\S+) ([\\d]+) (\\S+) \"(.+?)\\\" \"(.+?)\\\"";  
    System.out.println("Using RE Pattern:");
    System.out.println(logEntryPattern);

    Pattern p = Pattern.compile(logEntryPattern);

    try {
        myFile = new FileReader("e3600_access_log2016-05-24.log");
        buff = new BufferedReader(myFile);

        while (true) {
            String line = buff.readLine();
            if (line == null) {
                break;
            }

            Matcher matcher = p.matcher(line);
            System.out.println("groups: " + matcher.groupCount());
            if (!matcher.matches()) {
                System.err.println(line + matcher.toString());
                return;
            }

            System.out.println("%a Remote IP Address     : " + matcher.group(1));}
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            buff.close();
            myFile.close();
        } catch (IOException e) {
            e.printStackTrace();
        }}}}`

结果我得到了这个:

Using RE Pattern:
^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/] +\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\"
groups: 17
127.0.0.1 192.168.1.66 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"  "http://127.0.0.1:8080/CRUDProject/StudentController.do"java.util.regex.Matcher[pattern=^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/] +\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\" region=0,427 lastmatch=]`

所有的帮助都是关于如何以及我做错了什么,并且应该修复所以我可以得到我应该得到的结果。 感谢

2 个答案:

答案 0 :(得分:0)

您的模式与日志条目不匹配。使用http://regexr.com/之类的工具来调试正则表达式。

此修改后的模式与您的示例输入匹配:

^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/]+ [+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\"  "(.+?)\"

这可能无法解决你所有的问题,它看起来仍然很脆弱。再测试一下并调整模式。

答案 1 :(得分:0)

描述

此正则表达式将执行以下操作:

  • 匹配日志消息中的所有子字符串
  • 将每个匹配的子字符串放在自己的捕获组中

注意:要在java中使用此正则表达式,您需要将所有\替换为\\。我还在自己的行上留下了匹配每个子字符串的表达式。如果以此格式使用此表达式,则需要包含“忽略空格”标志,或者只需将表达式设为一行。请记住,此表达式不会对日期或IP地址子字符串进行详尽的验证。

^
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
([0-9]+)\s+
([0-9]+)\s+
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
-\s+
([a-z]+\s[0-9]+)\s+
(\?[^\s]+)\s+
-\s+
\[([0-9]{1,2}\/(?:Jan|feb|Mar|apr|may|Jun|July|Aug|Sep|Oct|Nov|Dec)\/[0-9]{4}(?::[0-9]{2}){3}\s+\+[0-9]{4})\]\s+
"([^"]+)"\s+
([0-9]+)\s+
([^\s]+)\s+
([0-9]+)\s+
([0-9a-f]+)\s+
"([^"]+)"\s+
"([^"]+)"

Regular expression visualization

要更好地查看图像,您可以右键单击图像并选择在新窗口中打开。

实施例

现场演示

https://regex101.com/r/mX7gG2/1

示例文字

  

127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080?action = edit& studentId = 1 - [24 / May / 2016:19:33:52 +0300]" GET / CRUDProject / StudentController .do?action = edit& studentId = 1 HTTP / 1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226" Mozilla / 5.0(Windows NT 6.1; WOW64)AppleWebKit / 537.36(KHTML,与Gecko一样)Chrome / 50.0.2661.102 Safari / 537.36" " http://127.0.0.1:8080/CRUDProject/StudentController.do"

样本匹配

[0][0] = 127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"  "http://127.0.0.1:8080/CRUDProject/StudentController.do"
[0][1] = 127.0.0.1
[0][2] = 192.168.1.1
[0][3] = 1050
[0][4] = 1050
[0][5] = 127.0.0.1
[0][6] = GET 8080
[0][7] = ?action=edit&studentId=1
[0][8] = 24/May/2016:19:33:52 +0300
[0][9] = GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1
[0][10] = 200
[0][11] = /CRUDProject/StudentController.do
[0][12] = 264
[0][13] = ABADDD8AFB03ECC4791D76E543290226
[0][14] = Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
[0][15] = http://127.0.0.1:8080/CRUDProject/StudentController.do

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    [0-9]{1,3}               any character of: '0' to '9' (between 1
                             and 3 times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    [0-9]{1,3}               any character of: '0' to '9' (between 1
                             and 3 times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    [0-9]{1,3}               any character of: '0' to '9' (between 1
                             and 3 times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  -                        '-'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \6:
----------------------------------------------------------------------
    [a-z]+                   any character of: 'a' to 'z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \6
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \7:
----------------------------------------------------------------------
    \?                       '?'
----------------------------------------------------------------------
    [^\s]+                   any character except: whitespace (\n,
                             \r, \t, \f, and " ") (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \7
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  -                        '-'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  \[                       '['
----------------------------------------------------------------------
  (                        group and capture to \8:
----------------------------------------------------------------------
    [0-9]{1,2}               any character of: '0' to '9' (between 1
                             and 2 times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \/                       '/'
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      Jan                      'Jan'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      feb                      'feb'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Mar                      'Mar'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      apr                      'apr'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      may                      'may'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Jun                      'Jun'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      July                     'July'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Aug                      'Aug'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Sep                      'Sep'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Oct                      'Oct'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Nov                      'Nov'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Dec                      'Dec'
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
    \/                       '/'
----------------------------------------------------------------------
    [0-9]{4}                 any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      :                        ':'
----------------------------------------------------------------------
      [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \+                       '+'
----------------------------------------------------------------------
    [0-9]{4}                 any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
  )                        end of \8
----------------------------------------------------------------------
  \]                       ']'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (                        group and capture to \9:
----------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \9
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \10:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \10
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \11:
----------------------------------------------------------------------
    [^\s]+                   any character except: whitespace (\n,
                             \r, \t, \f, and " ") (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \11
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \12:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \12
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \13:
----------------------------------------------------------------------
    [0-9a-f]+                any character of: '0' to '9', 'a' to 'f'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \13
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (                        group and capture to \14:
----------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \14
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (                        group and capture to \15:
----------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \15
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------