Question

我正在尝试用正则表达式解析一个日志文件，我理解第一个拔出IP地址，但我仍然坚持如何超越其余的日志文件。所以要开始解析其余的，我只是在正则表达式解析日期等？所以我将第二个元素作为72.37.100.86的第二个ip。然后我想排除＆＃34; - - - ＆＃34;并将日期作为第4个元素以及＆＃34; GET / HTTP / 1.1：＆＃34;成为第8个索引，状态代码为200，成为第9个索引。任何有关这方面的帮助将非常值得理解我接下来需要做什么。

package com.text.nginx_log_parser;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExTester {


// Actual Entry : 10.10.100.151 - 72.37.100.86, 192.36.20.508 - - - [04/Jul/2016:12:50:06 +0000]  https https https "GET / HTTP/1.1" 200 20027 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36"
public static String logEntry = "10.10.100.151 - 72.37.100.86, 192.36.20.508 - - - [04/Jul/2016:12:50:06 +0000]  https https https \"GET / HTTP/1.1\" 200 20027 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36\"\r\n";

//public static String regex = "(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})";
//public static String regex = "(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})";
public static void main (String [] args){

    String regex = "(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s*-*\\s*-*\\s*-*";
    regexChecker(regex, logEntry);
    regex = "\\[*\\]\\s.";
    regexChecker(regex, logEntry);
}

public static void regexChecker(String regex, String str){

    Pattern pattern = Pattern.compile(regex);

    Matcher matcher = pattern.matcher(logEntry);
    //String firstIP = matcher.group(0);
    //String secondIP = matcher.group();
    //String timestamp = 
    while(matcher.find()){
        System.out.println( matcher.group(0));
    }
  }
}

Answer 1

使用以下正则表达式：

(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})[-\s]+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).+?\[(.+?)\].*?\"(.+?)\"\s(\d{3}).*$

您正在根据this entry on regex101.com

查看 1 到 5 的捕获组

尝试使用正则表达式解析日志文件

1 个答案: