使用多行匹配条件改进日志文件解析器

时间:2015-08-03 19:40:13

标签: java java-8 text-parsing linkedhashmap logfile-analysis

给出一个有点特殊的日志文件,由以下代码片段表示:

FILE (insert): file=Templates\xyz_EN_0615.pdf key=KEY_EN_AP_PAID
FILE (insert): file=Templates\xyz_DE_0615.pdf key=KEY_DE_STD_PAID
FILE (insert): file=Templates\xyz_DE_0615_free.pdf key=KEY_DE_STD_FREE
FILE (insert): file=Templates\xyz_IT_0615.pdf key=KEY_IT_STD_PAID
FILE (insert): file=Templates\xyz_IT_0615_free.pdf key=KEY_IT_STD_FREE
DEBUG: Opening Migration\abc_1.pdf
DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
Jul 31, 2015 5:07:54 PM java.util.prefs.WindowsPreferences <init>
WARNUNG: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
Jul 31, 2015 5:07:55 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNUNG: Using fallback font ArialMT for base font ZapfDingbats
DEBUG: Writing Migration\abc_1-migrated.pdf
PERFORMANCE: [OVERALL completed in 2303ms]
DEBUG: Opening Migration\abc_2_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
Field not available: Reset_1
Field not available: Print
DEBUG: Writing Migration\abc_2_DE-migrated.pdf
PERFORMANCE: [OVERALL completed in 756ms]
DEBUG: Opening Migration\abc_3_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
DEBUG: Writing Migration\abc_3-migrated.pdf
PERFORMANCE: [OVERALL completed in 660ms]
DEBUG: Opening Migration\abc_4.pdf
DEBUG: Opening Templates\xyz_EN_0615_free.pdf
null
DEBUG: Opening Migration\abc_5.pdf
DEBUG: Opening Templates\xyz_EN_0615_free.pdf
null
DEBUG: Opening Migration\abc_6_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
Field not available: Text6
Field not available: Text7
Field not available: Text8
Field not available: Text9
Field not available: Text10
Field not available: Text11
DEBUG: Writing Migration\abc_6-migrated.pdf
PERFORMANCE: [OVERALL completed in 686ms]
null
%EOF

为了分析自动PDF表单字段转换服务的运行准确性,我需要过滤掉并计算以下4元组的所有出现次数:

DEBUG: Opening Migration\abc_1.pdf
DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
DEBUG: Writing Migration\abc_1-migrated.pdf
PERFORMANCE: [OVERALL completed in 2303ms]

最终的4元组之间可以有任意数量的行,可以跳过或添加到无效日志条目列表中。简单的选择标准硬编码到下面的代码中。

接下来,应该将日志文件拆分为有效的条目和无效的条目,包括行号。针对上述示例运行的当前程序的输出将输出:

Statistics: Valid[tuples]=4 Valid[lines]=16 Invalid[lines]=8 Skipped[lines]=17 Total[lines]=41
----------------------[VALID]----------------------
key=6 value=DEBUG: Opening Migration\abc_1.pdf
key=7 value=DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
key=12 value=DEBUG: Writing Migration\abc_1-migrated.pdf
key=13 value=PERFORMANCE: [OVERALL completed in 2303ms]
key=14 value=DEBUG: Opening Migration\abc_2_DE.pdf
key=15 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=18 value=DEBUG: Writing Migration\abc_2_DE-migrated.pdf
key=19 value=PERFORMANCE: [OVERALL completed in 756ms]
key=20 value=DEBUG: Opening Migration\abc_3_DE.pdf
key=21 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=22 value=DEBUG: Writing Migration\abc_3-migrated.pdf
key=23 value=PERFORMANCE: [OVERALL completed in 660ms]
key=30 value=DEBUG: Opening Migration\abc_6_DE.pdf
key=31 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=38 value=DEBUG: Writing Migration\abc_6-migrated.pdf
key=39 value=PERFORMANCE: [OVERALL completed in 686ms]
----------------------[VALID]----------------------
----------------------[INVALID]----------------------
key=24 value=DEBUG: Opening Migration\abc_4.pdf
key=25 value=DEBUG: Opening Templates\xyz_EN_0615_free.pdf
key=26 value=null
key=27 value=DEBUG: Opening Migration\abc_5.pdf
key=28 value=DEBUG: Opening Templates\xyz_EN_0615_free.pdf
key=29 value=null
key=40 value=null
key=41 value=%EOF
----------------------[INVALID]----------------------

这是我的方法:

import org.testng.annotations.Test;

import java.io.*;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

public class AnalyseMigrationLog {

    public class RingMap<K, V> extends LinkedHashMap<K, V> {
        private int cacheSize;

        public RingMap(int cacheSize) {
            super(cacheSize);
            this.cacheSize = cacheSize;
        }

        @Override
        protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
            return size() > cacheSize;
        }
    }

    @Test
    public void doAnalysis() throws IOException {
        final String logfile = "./run-simple.log";
        final int ringSize = 4;
        int lc = 0;
        int skipped = 0;
        Long count;
        String line;
        Map<Integer, String> circularFifo = new RingMap<>(ringSize);
        Map<Integer, String> validTuples = new LinkedHashMap<>();
        Map<Integer, String> invalidTuples = new LinkedHashMap<>();

        FileReader     fre = new FileReader(logfile);
        BufferedReader bre = new BufferedReader(fre);
        while ((line = bre.readLine ()) != null) {
            lc++;
            if (line.matches("^(FILE \\(insert\\):|WARNUNG|Field not available).*") || line.endsWith("<init>")) {
                skipped++;
                continue;
            }
            circularFifo.put(lc, line);
            if (circularFifo.size() < ringSize)
                continue;

            count = circularFifo.values().stream().
                    filter(p -> p.matches("^(DEBUG: Opening|DEBUG: Writing|PERFORMANCE:).*")).count();

            // Get the LRU entry in the circular fifo
            List<Map.Entry<Integer, String>> entryList = new ArrayList<>(circularFifo.entrySet());
            Map.Entry<Integer, String> lastEntry = entryList.get(entryList.size() - 1);

            if (count == ringSize && lastEntry.getValue().startsWith("PERFORMANCE:")) {
                validTuples.putAll(circularFifo);
                // Remove already pushed entries from invalidTuples list to avoid duplicate entries
                circularFifo.forEach((key, value) -> invalidTuples.remove(key));
                circularFifo.clear();
            } else {
                invalidTuples.putAll(circularFifo);
            }
        }
        // Put in the last entries that didn't fill up the circular fifo anymore.
        invalidTuples.putAll(circularFifo);
        bre.close();
        fre.close();

        System.out.printf("Statistics: Valid[tuples]=%s Valid[lines]=%s Invalid[lines]=%s Skipped[lines]=%s Total[lines]=%s%n",
                validTuples.size()/ringSize, validTuples.size(), invalidTuples.size(), skipped, lc);

        System.out.printf("----------------------[VALID]----------------------%n");
        validTuples.forEach((key, value) -> System.out.printf("key=%s value=%s%n", key, value));
        System.out.printf("----------------------[VALID]----------------------%n");

        System.out.printf("----------------------[INVALID]----------------------%n");
        invalidTuples.forEach((key, value) -> System.out.printf("key=%s value=%s%n", key, value));
        System.out.printf("----------------------[INVALID]----------------------%n");
    }
}

基本技巧是为此任务引入循环fifo。虽然简短,快速且工作得很好,但我想知道是否可以将其更充分地转换为Java-8功能,例如使用NIO2和适当的流技术。我不想使用Guava或任何其他过度设计的库来完成这么简单的任务。

现在,我特别不喜欢解决方案,如上所述获取LRU条目。我如何能够使用以下内容扩展和使用内部类:

public class RingMap<K, V> extends LinkedHashMap<K, V> {
    private int cacheSize;

    public RingMap(int cacheSize) {
        super(cacheSize);
        this.cacheSize = cacheSize;
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
        return size() > cacheSize;
    }

    //TODO: how exactly would this work?
    public <K, V> Map.Entry<K,V> getLast(LinkedHashMap<K, V> map) {
        Map.Entry<K, V> result = null;
        for (Map.Entry<K, V> kvEntry : map.entrySet()) {
            result = kvEntry;
        }
        return result;
    }
}

接下来,我真的想使用NIO2功能,但是我不明白如何将它们最好地集成到我的解决方案中。有点像:

@Test
public void doAnalysisNIO2() throws IOException {
    final String logfile = "./run-simple.log";

    Path path = Paths.get(logfile);
    try (Stream<String> filteredLines = Files.lines(path, StandardCharsets.UTF_8)
            .onClose(() -> System.out.println("Stream has been closed!"))
            .filter(s -> !(s.matches("^(FILE \\(insert\\):|WARNUNG|Field not available).*") ||
                           s.endsWith("<init>")))) {
        // Do the same thing as in the other code
        filteredLines.forEach((l) -> System.out.printf("line = %s%n", l));
    }
}

0 个答案:

没有答案