我正在使用http://www.dabeaz.com/generators/fieldmap.py
中的字段映射生成器功能#!/usr/bin/env python
def field_map(dictseq, name, func):
for d in dictseq:
d[name] = func(d[name])
yield d
if __name__ == '__main__':
loglines = open("test.log")
import re
logpats = r'(\S+) (\S+) (\S+) (\S+) (\S+) \[(.*?)\] \"(.*?)\" (\S+) (\S+) \"(.*?)\" \"(.*?)\" (\S+) \"(.*?)\" \"(.*?)\" (\S+)'
logpat = re.compile(logpats)
groups = (logpat.match(line) for line in loglines)
tuples = (g.groups() for g in groups if g)
#for t in tuples:
# print t
colnames = ('record_id', 'elapsed_time', 'client', 'username' , 'client_id','date',
'http_method_url', 'status', 'size', 'http_referer','useragent', 'mime',
'filter_name_reason', 'profiles', 'ipport')
log = (dict(zip(colnames,t)) for t in tuples)
log = field_map(log,"status",int)
log = field_map(log,"size",lambda s: int(s) if s != '-' else 0)
for x in log:
print x
它给出了这个错误,任何想法?
[root@cumbria extended]# python fieldmap.py
Traceback (most recent call last):
File "fieldmap.py", line 24, in <module>
for x in log:
File "fieldmap.py", line 4, in field_map
for d in dictseq:
File "fieldmap.py", line 5, in field_map
d[name] = func(d[name])
ValueError: invalid literal for int() with base 10: 'status'
test.log包含此格式的数据
"1356313509.519-6-10.66.54.21-8080" 2089 10.112.151.213 "anonymous@10.112.151.213" "6" [24/Dec/2012:01:45:11] "GET http://apps.facebook.com:80/thesimssocial/?fb_source=bookmark_apps&ref=bookmarks&count=2&fb_bmpos=4_2" 200 58300 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 BMID/E679E9E153" text/html "- -" "M&B-112,HTTP,QUERIES,uncachable,antivirus,REDIRECT_THIS" "10.66.54.21:8080"
答案 0 :(得分:1)
test.log中的第一行可能是包含字段名称而不是其值的标题。这就是为什么你看到“状态”而不是例如“200”。
您可以让正则表达式更具选择性,以便尽早过滤掉不合适的行,例如,使用\d+
来匹配http状态。