Python和正则表达式

时间:2015-12-17 16:41:23

标签: python regex

我正在尝试匹配以下字符串:

 2 match virtual-address 172.29.210.119 tcp eq www
  4 match virtual-address 172.29.210.147 tcp any

预期产出:

 172.29.210.119
 tcp
 www

 172.29.210.147
 tcp
 any

我正在使用模式:

   match virtual-address (\d+\.\d+\.\d+\.\d+)\s?(\w+)? (?>eq)?\s?(\d+|\w+)

我在https://regex101.com/

中获得了该模式测试的预期输出

但是当我在python中使用相同的模式匹配时,我收到以下错误:

 Traceback (most recent call last):
      File ".\ace2f5_parser.py", line 119, in <module>
        virtual_ip_proto_port = re.findall(pattern_virtual_ip_proto_port, line)
      File "C:\Users\hpokhare\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 213, in findall
        return _compile(pattern, flags).findall(string)
      File "C:\Users\hpokhare\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 293, in _compile
        p = sre_compile.compile(pattern, flags)
      File "C:\Users\hpokhare\AppData\Local\Programs\Python\Python35-32\lib\sre_compile.py", line 536, in compi
        p = sre_parse.parse(p, flags)
      File "C:\Users\hpokhare\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 829, in parse
        p = _parse_sub(source, pattern, 0)
      File "C:\Users\hpokhare\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 437, in _parse_
        itemsappend(_parse(source, state))
      File "C:\Users\hpokhare\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 767, in _parse
        len(char) + 1)
    sre_constants.error: unknown extension ?> at position 53

错误意味着什么。它不支持吗?&gt;。关于如何解决问题的任何想法。

2 个答案:

答案 0 :(得分:3)

你可以在Python中使用这个正则表达式:

\bmatch virtual-address (\d+\.\d+\.\d+\.\d+)\s?(\w+) (?:eq\s+)?(\w+)

RegEx Demo

Python正则表达式不支持像PCRE一样的 Atomic Group 语法(?>..)

答案 1 :(得分:0)

如果您为&#34; python&#34;修改regex101的味道,您将看到无法使用(?>eq)?

您想要的替代方法是使用$来断言行尾的位置。使用(\w+)$将捕获最后一个字符串句子。

import re

text = [
    '2 match virtual-address 172.29.210.119 tcp eq www',
    '4 match virtual-address 172.29.210.147 tcp any'
]

regexp = re.compile(r'match virtual-address (\d+\.\d+\.\d+\.\d+)\s(\w+).*?\s(\w+)$')
for i in text:
    ip, protocol, url = regexp.search(i).groups()
    print(ip, protocol, url, '', sep='\n')