在Python中解析GC日志

时间:2018-11-09 19:13:04

标签: python regex logparser

这是我的示例GC日志

2018-11-07T10:23:48.445+0000: 61292.406: [Full GC (Ergonomics) [PSYoungGen: 31552K->0K(16692224K)] [ParOldGen: 50295013K->6237441K(50331648K)] 50326565K->6237441K(67023872K) [PSPermGen: 51953K->51859K(262144K)], 4.2519270 secs] [Times: user=53.34 sys=2.52, real=4.26 secs]
2018-11-08T17:07:23.830+0000: 171907.790: [Full GC (Ergonomics) [PSYoungGen: 61350K->0K(16698368K)] [ParOldGen: 50313713K->7350597K(50331648K)] 50375063K->7350597K(67030016K) [PSPermGen: 52217K->52147K(262144K)], 3.8021880 secs] [Times: user=60.01 sys=0.17, real=3.80 secs]
2018-11-07T20:09:28.162+0000: 72186.081: [GC (Allocation Failure) [PSYoungGen: 16527616K->13908K(16641536K)] 20157858K->3651137K(32370176K), 0.1829187 secs] [Times: user=3.22 sys=0.02, real=0.18 secs]
2018-11-07T20:26:39.304+0000: 73217.223: [GC (Allocation Failure) [PSYoungGen: 16530004K->12288K(16669696K)] 20167233K->3658872K(32398336K), 0.1700098 secs] [Times: user=2.95 sys=0.02, real=0.17 secs]
2018-11-07T20:53:56.935+0000: 74854.855: [GC (Allocation Failure) [PSYoungGen: 16566272K->12644K(16661504K)] 20212856K->3666239K(32390144K), 0.1757810 secs] [Times: user=3.09 sys=0.02, real=0.18 secs]
2018-11-07T21:11:43.359+0000: 75921.279: [GC (Allocation Failure) [PSYoungGen: 16566628K->11904K(16685056K)] 20220223K->3673363K(32413696K), 0.1464264 secs] [Times: user=2.53 sys=0.02, real=0.14 secs]
2018-11-07T21:35:31.862+0000: 77349.782: [GC (Allocation Failure) [PSYoungGen: 16597632K->11872K(16677888K)] 20259091K->3680475K(32406528K), 0.1539087 secs] [Times: user=2.67 sys=0.03, real=0.15 secs]
2018-11-07T22:00:06.604+0000: 78824.523: [GC (Allocation Failure) [PSYoungGen: 16597600K->13488K(16698368K)] 20266203K->3687924K(32427008K), 0.1748201 secs] [Times: user=3.07 sys=0.02, real=0.17 secs]
2018-11-07T22:24:21.694+0000: 80279.614: [GC (Allocation Failure) [PSYoungGen: 16626352K->13968K(16691712K)] 20300788K->3696724K(32420352K), 0.1621628 secs] [Times: user=2.85 sys=0.02, real=0.16 secs]
2018-11-07T22:45:44.177+0000: 81562.096: [GC (Allocation Failure) [PSYoungGen: 16626832K->10772K(16708608K)] 20309588K->3703288K(32437248K), 0.1612409 secs] [Times: user=2.82 sys=0.02, real=0.16 secs]
2018-11-07T23:10:31.320+0000: 83049.239: [GC (Allocation Failure) [PSYoungGen: 16646164K->11440K(16704000K)] 20338680K->3708716K(32432640K), 0.1824199 secs] [Times: user=3.20 sys=0.03, real=0.18 secs]
2018-11-07T23:37:17.932+0000: 84655.852: [GC (Allocation Failure) [PSYoungGen: 16646832K->9856K(16717312K)] 20344108K->3713660K(32445952K), 0.1891362 secs] [Times: user=3.29 sys=0.04, real=0.19 secs]

您可以看到大多数日志包含 PSYoungGen ,但是当只有 Full GC 时,我想获取GC循环时间。

我在下面的查询中编写了代码,但它得到的是所有实际秒数,而不是完整GC

import re

log_file = '/Users/parse_log/full_gc_log_md.txt'
regex = 'real=\d.\d'

with open(log_file, 'r') as file:
    for line in file:
        for match in re.finditer(regex, line, re.S):
            match_text = match.group()
            print match_text

在正则表达式中,我也尝试了regex = '.Full GC.real=\d.\d'regex = '.Full GC.real=\d.\d'都没有用。

1 个答案:

答案 0 :(得分:2)

您的正则表达式需要一些更正。您需要确保该行包含完整的GC文本,并且要匹配数字,需要使用\ d +而不是\ d。尝试使用此正则表达式,

Full GC.*?real=(\d+.\d+)

说明:

  • Full GC.*?->匹配文字“ Full GC”和一些文本
  • real=->匹配文字real =
  • (\d+.\d+)->这与您感兴趣的秒数数据匹配

Demo

这是使用3行输入的相同示例python代码,

import re
s = '2018-11-07T10:23:48.445+0000: 61292.406: [Full GC (Ergonomics) [PSYoungGen: 31552K->0K(16692224K)] [ParOldGen: 50295013K->6237441K(50331648K)] 50326565K->6237441K(67023872K) [PSPermGen: 51953K->51859K(262144K)], 4.2519270 secs] [Times: user=53.34 sys=2.52, real=4.26 secs]\n2018-11-08T17:07:23.830+0000: 171907.790: [Full GC (Ergonomics) [PSYoungGen: 61350K->0K(16698368K)] [ParOldGen: 50313713K->7350597K(50331648K)] 50375063K->7350597K(67030016K) [PSPermGen: 52217K->52147K(262144K)], 3.8021880 secs] [Times: user=60.01 sys=0.17, real=3.80 secs]\n2018-11-07T20:09:28.162+0000: 72186.081: [GC (Allocation Failure) [PSYoungGen: 16527616K->13908K(16641536K)] 20157858K->3651137K(32370176K), 0.1829187 secs] [Times: user=3.22 sys=0.02, real=0.18 secs]'
results = re.findall('Full GC.*?real=(\d+.\d+)',s)
print(results)

这将打印以下输出,

['4.26', '3.80']