python生成器int()的无效文字

时间:2012-12-24 21:00:53

标签: python linux

我正在使用http://www.dabeaz.com/generators/fieldmap.py

中的字段映射生成器功能
#!/usr/bin/env python

def field_map(dictseq, name, func):
    for d in dictseq:
        d[name] = func(d[name])
        yield d

if __name__ == '__main__':
    loglines = open("test.log")
    import re
    logpats = r'(\S+) (\S+) (\S+) (\S+) (\S+) \[(.*?)\] \"(.*?)\" (\S+) (\S+) \"(.*?)\" \"(.*?)\" (\S+) \"(.*?)\" \"(.*?)\" (\S+)'
    logpat = re.compile(logpats)
    groups = (logpat.match(line) for line in loglines)
    tuples = (g.groups() for g in groups if g)
    #for t in tuples:
    #    print t

    colnames = ('record_id', 'elapsed_time', 'client', 'username' , 'client_id','date',
                'http_method_url', 'status', 'size', 'http_referer','useragent', 'mime',
                'filter_name_reason', 'profiles', 'ipport')
    log = (dict(zip(colnames,t)) for t in tuples)
    log = field_map(log,"status",int)
    log = field_map(log,"size",lambda s: int(s) if s != '-' else 0)
    for x in log:
        print x

它给出了这个错误,任何想法?

[root@cumbria extended]# python fieldmap.py
Traceback (most recent call last):
  File "fieldmap.py", line 24, in <module>
    for x in log:
  File "fieldmap.py", line 4, in field_map
    for d in dictseq:
  File "fieldmap.py", line 5, in field_map
    d[name] = func(d[name])
ValueError: invalid literal for int() with base 10: 'status'

test.log包含此格式的数据

"1356313509.519-6-10.66.54.21-8080" 2089 10.112.151.213 "anonymous@10.112.151.213" "6" [24/Dec/2012:01:45:11] "GET http://apps.facebook.com:80/thesimssocial/?fb_source=bookmark_apps&ref=bookmarks&count=2&fb_bmpos=4_2" 200 58300 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 BMID/E679E9E153" text/html "- -" "M&B-112,HTTP,QUERIES,uncachable,antivirus,REDIRECT_THIS" "10.66.54.21:8080"

1 个答案:

答案 0 :(得分:1)

test.log中的第一行可能是包含字段名称而不是其值的标题。这就是为什么你看到“状态”而不是例如“200”。

您可以让正则表达式更具选择性,以便尽早过滤掉不合适的行,例如,使用\d+来匹配http状态。