如何通过正则表达式从uri捕获可选参数

时间:2015-10-26 13:52:40

标签: regex python-3.x logging nginx

以下方法从nginx log解析一行:

def test_parse_line2(self):
    groups = ['ip', 'timestamp', 'offset', 'command', 'path', 'protocol', 'status', 'bytes', 'client']
    line = '1.2.3.4 - - [22/Oct/2015:12:01:49 -0500] "GET /mypath/?param1=value1&param2=value2 HTTP/1.1" 200 51 "-" "SomeRandomClient"'
    pattern = r'(?P<ip>[^ ]+) - - \[(?P<timestamp>[^ ]+) (?P<offset>[-\+][0-9]{4})] "' +\
        r'(?P<command>[A-Z]+) /(?P<path>[^ ]+) (?P<protocol>[^"]+)" (?P<status>[0-9]+) (?P<bytes>[0-9]+) (?:[^ ]+)'+\
        r' "(?P<client>[^"]+)'
    match = re.search(pattern, line)
    if match:
        for group_name in groups:
            print(group_name, match.group(group_name))

有没有办法对其进行修改,以便我分别捕获必填路径mypath和可选参数param1=value1&param2=value2

1 个答案:

答案 0 :(得分:0)

您需要用两个不同的匹配器替换路径的模式匹配器:(?P<mypath>[^?]+)\?(?P<myargs>[^ ]+)