使用Python从Apache Log中提取文件名和公用名

时间:2017-09-25 15:08:58

标签: python apache parsing logging

我正在尝试解析apache日志文件,希望通过python从access.log文件中提取AD公用名和文件名。

我的access.log文件如下:

[01/Jan/1901:12:00:01] 12.34.56.78 TLS Protocol EncryptionMethod "GET/.../filename.zip HTTP/1.1" "CN=Smith John A,......"

我想要提取的是以下格式:Smith John A, filename.zip

我试图从Github使用几个自定义python apache日志解析器而没有任何运气。

有任何想法实现这一目标吗?

感谢。

1 个答案:

答案 0 :(得分:1)

真的很基本。

import re
with open('access.log') as log:
    for line in log.readlines():
        results = [_.group() for _ in re.finditer(r'"([^"]*)"', line)]
        if len(results) == 2:
            print (results)
        else:
            print (line)
            print ("**** can't parse")
            continue
        m = re.search(r'GET\/.*?([a-z._]+) ', line, re.I)
        count = 0
        if m:
            filename = m.groups(0)[0]
            count += 1
        else:
            filename = ''
        m = re.search(r'CN=([^,]+),', line, re.I)
        if m:
            name = m.groups(0)[0]
            count += 1
        else:
            name = ''
        print (name, filename)
        if count != 2:
            print ("***can't parse filename or name")

未测试!

该单行文件的结果:

['"GET/.../filename.zip HTTP/1.1"', '"CN=Smith John A,......"']
Smith John A filename.zip