问题
我是python的新手,我正尝试(使用python)浏览大量大型自定义日志文件,以从某些GET请求中提取参数,并尝试从中获取som统计信息。
我正在解析的日志文件如下:
80 172.23.131.149 "2018-07-05 13:08:25 860" "POST /bios/servlet/bios.servlets.sso.WaffleLoginServlet HTTP/1.1" 401 5 891 891 "-" "Java/1.8.0_171"
8080 172.23.131.251 "2018-07-05 13:08:26 594" "HEAD /bios/servlet/bios.servlets.web.Ping?level=3 HTTP/1.0" 200 - 1953 1953 "-" "-"
8080 172.23.131.252 "2018-07-05 13:08:26 594" "HEAD /bios/servlet/bios.servlets.web.Ping?level=3 HTTP/1.0" 200 - 953 953 "-" "-"
80 172.23.131.149 "2018-07-05 13:08:28 188" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156240.234375%2C6576777.34375%2C156269.53125%2C6576806.640625 HTTP/1.1" 200 133210 3547 3516 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 188" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156240.234375%2C6576748.046875%2C156269.53125%2C6576777.34375 HTTP/1.1" 200 108066 3547 3532 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 188" "POST /bios/servlet/bios.servlets.GetGeometryComponents HTTP/1.1" 401 4 2484 2484 "-" "Java/1.8.0_171"
80 172.23.131.149 "2018-07-05 13:08:28 204" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156210.9375%2C6576806.640625%2C156240.234375%2C6576835.9375 HTTP/1.1" 200 123953 3563 3547 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 204" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156210.9375%2C6576777.34375%2C156240.234375%2C6576806.640625 HTTP/1.1" 200 147132 3563 3547 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 204" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156269.53125%2C6576777.34375%2C156298.828125%2C6576806.640625 HTTP/1.1" 200 145701 3563 3547 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
我想做什么?
我很难让上面的1号工作。 我找不到任何有用的信息(可能不是在寻找正确的东西)。问题似乎是单词“ GetMap”位于较长的字符串中。但这听起来有点容易,但是我无法弄清楚该怎么做。
我现在在上面的任务列表中用于执行数字2和3的代码是:
#!/usr/bin/env python3
import os
import re
from collections import Counter
# regular expression
rexp = r"(^.+[LAYERSlayers]=(?P<domain>.*?)&)" # sök efter LAYERS= eller layer=
# create counter dictionary
cnt_domains = Counter()
path = '/home/uwestephan/Logg-file-parsing/ws00524'
matched = 0
failed = 0
for filename in os.listdir(path):
filmedsokvag = (path+"/"+filename)
print (filmedsokvag)
# read file / gather data
f = open(filmedsokvag, 'r')
for line in f:
m = re.match(rexp, line)
if m:
cnt_domains.update([m.group('domain')])
matched += 1
else:
failed += 1
# Output Results
print('[*] %d lines matched the regular expression' % (matched))
print('[*] %d lines failed to match the regular expression' % (failed), end='\n\n')
print('[*] ============================================')
print('[*] 100 Most Frequently Occurring of Lager Queried')
print('[*] ============================================')
for domain, count in cnt_domains.most_common(100):
print('[*] %30s: %d' % (domain, count))
print('[*] ============================================')
# Output results to file
with open('parseroutput.txt', 'w') as fd:
print('[*] %d lines matched the regular expression' % (matched), file=fd)
print('[*] %d lines failed to match the regular expression' % (failed), end='\n\n', file=fd)
print('[*] ============================================', file=fd)
print('[*] 100 Most Frequently Occurring Lager Queried', file=fd)
print('[*] ============================================', file=fd)
for domain, count in cnt_domains.most_common(100):
print('[*] %30s: %d' % (domain, count), file=fd)
print('[*] ============================================', file=fd)
您对如何提取GetMap请求有疑问吗? 先感谢您!
答案 0 :(得分:0)
检查行是否包含'GetMap'
,如果不包含则跳过行。
for line in f:
if 'GetMap' in line: # check for 'GetMap'
m = re.match(rexp, line)
if m:
cnt_domains.update([m.group('domain')])
matched += 1
else:
failed += 1