Python正则表达式匹配Apache LogFormat" combinedvhost"

时间:2017-01-06 16:45:52

标签: python regex

LogFormat "%v %a %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combinedvhost
CustomLog "/var/log/apache2/access_log" combinedvhost    

我有一个apache配置,生成一个具有以上日志格式的access_log。我试图创建一个创建组的python(2.7.13)正则表达式(忽略HTTP方法和HTTP版本)。

以下是我的正则表达式:

(?P<host>.*)\s+(?P<ip>\S+)\s+-\s+-\s+\[(?P<date>\S+)\s+(?P<timezone>.*)\]\s+"\S+\s+(?P<path>\S+)(?:\?(?P<querystring>\S+))?\s+\S+"\s+(?P<status>\S+)\s+(?P<length>\S+)\s+"(?P<referrer>.*)"\s+"(?P<user_agent>.*)"\s+

我的问题是第一个日志行,其中预期结果为path = /querystring = simplode_ajax=true&simplode_query%5Border%5D=DESC。它的接缝就像我的路径组匹配贪婪,虽然它返回querystring = None而整个字符串却返回path而不是......

我正在测试上面的正则表达式,然后在http://pythex.org下面进行测试。

default 1.2.3.4 - - [05/Jan/2017:10:56:18 -0800] "GET /?simplode_ajax=true&simplode_query%5Border%5D=DESC HTTP/1.1" 200 - "http://www.xxx.xx/xxx/xx/" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
default 1.2.3.4 - - [05/Jan/2017:10:56:20 -0800] "GET /xxx/xx/06/22/xxxxx/ HTTP/1.1" 200 11098 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:20 -0800] "POST /xxxxxx.php HTTP/1.1" 200 370 "-" "-"
default 1.2.3.4 - - [05/Jan/2017:10:56:23 -0800] "GET /blog/xxx/01/22/xxxxx/ HTTP/1.1" 200 14404 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:24 -0800] "GET /blog/xxxxx/ HTTP/1.1" 200 21901 "https://www.codingmerc.com/blog/" "Mozilla/5.0 (compatible; spbot/5.0.3; +http://OpenLinkProfiler.org/bot )"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:25 -0800] "POST /xxxxx.php HTTP/1.1" 200 370 "-" "-"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:29 -0800] "GET /blog/xxxxx/ HTTP/1.1" 200 13831 "https://www.xxx.xx/blog/" "Mozilla/5.0 (compatible; spbot/5.0.3; +http://OpenLinkProfiler.org/bot )"

1 个答案:

答案 0 :(得分:1)

如果你只是让你的路径组不贪心似乎有用: 将+?替换为pip install sphinx sphinx-quickstart -q -p DSPackageDocs -a Me -v 1 --ext-autodoc perl -i -e $'s/#sys.path.insert(0, os.path.abspath(\'.\'))/sys.path.insert(0, os.path.abspath(\'.\/DSPackage\'))/g' conf.py