正则表达式日志解析

时间:2017-03-30 14:56:39

标签: java regex

我正在使用正则表达式来解析日志。我之前正在将文件读入一个字符串数组,然后如果我不匹配时间戳则遍历字符串数组追加,否则我将我正在迭代的行添加到变量并继续搜索。获得完整的日志条目后,我使用另一个正则表达式来解析它。

扫描文件

try {
    List<String> lines = Files.readAllLines(filepath);

    Pattern pattern = Pattern.compile("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}");
    Matcher matcher;
    String currentEntry = "";
    for(String line : lines) {
        matcher = pattern.matcher(line);
        // If this is a new entry, then wrap up the previous one and start again
        if ( matcher.lookingAt() ) {
            // If the previous entry was not empty
            if(!StringUtils.trimWhitespace(currentEntry).isEmpty()) {
                entries.add(new LogEntry(currentEntry));
            }

            // Clear the current entry
            currentEntry = "";
        }

        if (!currentEntry.trim().isEmpty())
            currentEntry += "\n";
        currentEntry += line;
    }
    // At the end, if we have one leftover entry, add it
    if (!currentEntry.isEmpty()) {
        entries.add(new LogEntry(currentEntry));
    }
}catch (Exception ex){
    return null;
}

解析条目

final private static String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final private static String levelRgx = "(?<level>(?>INFO|ERROR|WARN|TRACE|DEBUG|FATAL))";
final private static String classRgx = "\\[(?<class>[^]]+)\\]";
final private static String threadRgx = "\\[(?<thread>[^]]+)\\]";
final private static String textRgx = "(?<text>.*)";
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx + "$", Pattern.DOTALL);

public LogEntry(String logText) {

    try {
        Matcher matcher = PatternFullLog.matcher(logText);
        matcher.find();

        String dateStr = matcher.group("timestamp");
        timestamp = new DateLogLevel();
        timestamp.parseLogDate(dateStr);

        String levelStr = matcher.group("level");
        loglevel = LOG_LEVEL.valueOf(levelStr);
        String fullClassStr = matcher.group("class");

        String[] classNameArray = fullClassStr.split("\\.");
        framework = classNameArray[2];
        classname = classNameArray[classNameArray.length - 1];
        threadname = matcher.group("thread");
        logtext = matcher.group("text");
        notes = "";

    } catch (Exception ex) {
        throw ex;
    }
}

我想弄清楚

我真正想要做的是将整个文件作为单个字符串读取,然后使用单个正则表达式逐行解析,使用单个正则表达式一次。我的计划是使用我在构造函数中使用的相同表达式,但是当查找日志文本时,它会以EOF或下一个日志行结束,因此

final String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final String levelRgx = "(?<level>(?>INFO|ERROR|WARN|TRACE|DEBUG|FATAL))";
final String classRgx = "\\[(?<class>[^]]+)\\]";
final String threadRgx = "\\[(?<thread>[^]]+)\\]";
final String textRgx = "(?<text>.*[^(\Z|\\d{4}\-\\d{2}\-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})"; // change to handle multiple lines
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx + "$", Pattern.DOTALL);

try {
    // Read file into string
    String lines = readFile(filepath);

    Pattern pattern = Pattern.compile("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}");
    Matcher matcher;

    matcher = pattern.matcher(line);
    while(matcher.find())
        String dateStr = matcher.group("timestamp");
        timestamp = new DateLogLevel();
        timestamp.parseLogDate(dateStr);

        String levelStr = matcher.group("level");
        loglevel = LOG_LEVEL.valueOf(levelStr);
        String fullClassStr = matcher.group("class");

        String[] classNameArray = fullClassStr.split("\\.");
        framework = classNameArray[2];
        classname = classNameArray[classNameArray.length - 1];
        threadname = matcher.group("thread");
        logtext = matcher.group("text");
        entries.add(
            new LogEntry(
                timestamp,
                loglevel,
                framework,
                threadname,
                logtext,
                ""/* Notes are empty when importing new file */));
        }
    }

}catch (Exception ex){
    return null;
}

问题是我似乎无法将最后一组(textRgx)与多行匹配,直到时间戳或文件结束。有人有想法吗?

示例日志条目

2017-03-14 22:43:14,405 FATAL [org.springframework.web.context.support.XmlWebApplicationContext]-[localhost-startStop-1] Refreshing Root WebApplicationContext: startup date [Tue Mar 14 22:43:14 UTC 2017]; root of context hierarchy
2017-03-14 22:43:14,476 INFO  [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Loading XML bean definitions from Serv
2017-03-14 22:43:14,476 INFO  [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Here is a multiline
log entry with another entry after
2017-03-14 22:43:14,476 INFO  [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Here is a multiline
log entry with no entries after

1 个答案:

答案 0 :(得分:1)

您需要定义类似

的模式
final static String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final static String levelRgx = "(?<level>INFO|ERROR|WARN|TRACE|DEBUG|FATAL)";
final static String classRgx = "\\[(?<class>[^\\]]+)]";
final static String threadRgx = "\\[(?<thread>[^\\]]+)]";
final static String textRgx = "(?<text>.*?)(?=\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}|\\Z)";
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx, Pattern.DOTALL);

然后,您可以使用

Matcher matcher = PatternFullLog.matcher(line);

请参阅Java demo

这是模式的样子:

(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) (?<level>INFO|ERROR|WARN|TRACE|DEBUG|FATAL)\s+\[(?<class>[^\]]+)]-\[(?<thread>[^\]]+)]\s+(?<text>.*?)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}|\Z)

请参阅regex demo

一些注意事项:

  • 您在转义符号方面遇到了一些问题(必须对字符类中的]进行转义,\-应替换为-
  • 匹配文本到日期时间或字符串结尾的模式是(?<text>.*?)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}|\Z),其中.*?匹配任何char,0+次出现,不情愿,直到第一次出现的时间戳模式({{ 1}})或字符串结尾(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3})。