从日志文件i python

时间:2018-08-07 12:40:44

标签: python python-3.x parsing

我是python的新手,我正尝试(使用python)浏览大量大型自定义日志文件,以从某些GET请求中提取参数,并尝试从中获取som统计信息。我有点遥远,但我遇到了两个问题,我和我的同事无法弄清他们为什么让我们如此头痛。

80 172.23.131.149 "2018-07-05 13:08:25 860" "POST /bios/servlet/bios.servlets.sso.WaffleLoginServlet HTTP/1.1" 401 5 891 891 "-" "Java/1.8.0_171"
8080 172.23.131.251 "2018-07-05 13:08:26 594" "HEAD /bios/servlet/bios.servlets.web.Ping?level=3 HTTP/1.0" 200 - 1953 1953 "-" "-"
8080 172.23.131.252 "2018-07-05 13:08:26 594" "HEAD /bios/servlet
bios.servlets.web.Ping?level=3 HTTP/1.0" 200 - 953 953 "-" "-"
80 172.23.131.149 "2018-07-05 13:08:28 188" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156240.234375%2C6576777.34375%2C156269.53125%2C6576806.640625 HTTP/1.1" 200 133210 3547 3516 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 188" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156240.234375%2C6576748.046875%2C156269.53125%2C6576777.34375 HTTP/1.1" 200 108066 3547 3532 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 188" "POST /bios/servlet/bios.servlets.GetGeometryComponents HTTP/1.1" 401 4 2484 2484 "-" "Java/1.8.0_171"
80 172.23.131.149 "2018-07-05 13:08:28 204" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156210.9375%2C6576806.640625%2C156240.234375%2C6576835.9375 HTTP/1.1" 200 123953 3563 3547 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 204" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156210.9375%2C6576777.34375%2C156240.234375%2C6576806.640625 HTTP/1.1" 200 147132 3563 3547 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 204" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156269.53125%2C6576777.34375%2C156298.828125%2C6576806.640625 HTTP/1.1" 200 145701 3563 3547 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.137.120 "2018-07-06 10:04:32 856" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_GRA?FORMAT=image%2Fpng&TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&STYLES=&SRS=EPSG%3A5850&BBOX=150000,6580000,151875,6581875&WIDTH=256&HEIGHT=256 HTTP/1.1" 200 58443 0 0 "https://iservice.stockholm.se/open/TyckTill/Pages/TyckTill.aspx?systemId=synpunktsportalen" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
80 172.23.137.120 "2018-07-06 10:04:25 400" "GET /bios/dpwebmap/cust_sth/slk/tycktill/app.htmlclient.gwt.DPWebApp.nocache.js HTTP/1.1" 200 3924 0 0 "https://iservice.stockholm.se/open/TyckTill/Pages/TyckTill.aspx?systemId=synpunktsportalen" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"

我想做什么

  1. 使用字符串“ REQUEST = GetMap”提取所有行的服务名称。服务的名称在字符串“ / bios / wms / app /”和第一个问题标记“?”之间在行中。我提到它必须是第一个问号,因为每行中可以有多个问号(例如,我的日志示例中的倒数第二行有两个问号)。

我使用的正则表达式是:

rexp_name = r"(^.+/bios/wms/app/(?P<name>.+)\?)"

我的问题是我的正则表达式只是找到行中的最后一个问号,并返回“ / bios / wms / app /”和最后一个问号之间的整个字符串,而不仅仅是第一个问号。对于带有几个问号的行,该名称变得很长。

您是否有办法知道在第一个问号之后停止正则表达式?

0 个答案:

没有答案