正则表达式匹配pattern.one跳过换行符和字符,直到pattern.two

时间:2016-05-04 09:23:02

标签: python regex

我需要有关多行跳过的正则表达式的帮助,直到模式并且看不到它已被覆盖。

Name of person 
 Jack
 Nichol 
 Age 42
 .....
 .....
 ....
Name of person
 Andrew
 Jason
 Age 54
...

... ...

我如何匹配 - 类似于(Name.*(?:(\n)+).*(?:Age))

考虑以下 -

interface TenGigE0/0/0/7



shutdown

!

interface TenGigE0/0/0/8



 bundle id 221 mode active

 lacp period short

 lacp period short receive 100

 lacp period short transmit 100

 carrier-delay up 100 down 100

 load-interval 30

 frequency synchronization

 !

 transceiver permit pid all

!

interface TenGigE0/0/0/9



 mtu 9216

 frequency synchronization

 !

 transceiver permit pid all

!

interface TenGigE0/0/0/10



 bundle id 237 mode active

 lacp period short

 lacp period short receive 100

 lacp period short transmit 100

 carrier-delay up 120000 down 150

 load-interval 30

 frequency synchronization

我如何匹配所有tengigex / x / x / x和相应的载波延迟线。

如下所示 -

[接口TenGigE0 / 0/0/8,载波延迟100降100] [接口TenGigE0 / 0/0/10,载波延迟120000下降150] ......等等。

2 个答案:

答案 0 :(得分:2)

要匹配包含tengigecarrier-delay最近行之间的内容,您需要tempered greedy token(或展开的版本):

(?sim)^([^\n]*TenGigE[^\n]*)(?:(?!TenGigE|carrier-delay).)*([^\n]*carrier-dela‌​y[^\n]*)

请参阅regex demo

请参阅Python demo

import re
p = re.compile(r'^([^\n]*TenGigE[^\n]*)(?:(?!TenGigE|carrier-delay).)*([^\n]*carrier-delay[^\n]*)', re.DOTALL | re.M | re.I)
test_str = "interface TenGigE0/0/0/8\n bundle id 221 mode active\n lacp period short\n lacp period short receive 100\n lacp period short transmit 100\n carrier-delay up 100 down 100\n\ninterface TenGigE0/0/0/7\n\n\n\nshutdown\n\n!\n\ninterface TenGigE0/0/0/8\n\n\n\n bundle id 221 mode active\n\n lacp period short\n\n lacp period short receive 100\n\n lacp period short transmit 100\n\n carrier-delay up 100 down 100\n\n load-interval 30\n\n frequency synchronization\n\n !\n\n transceiver permit pid all\n\n!\n\ninterface TenGigE0/0/0/9\n\n\n\n mtu 9216\n\n frequency synchronization\n\n !\n\n transceiver permit pid all\n\n!\n\ninterface TenGigE0/0/0/10\n\n\n\n bundle id 237 mode active\n\n lacp period short\n\n lacp period short receive 100\n\n lacp period short transmit 100\n\n carrier-delay up 120000 down 150\n\n load-interval 30\n\n frequency synchronization"
print(p.findall(test_str))
# => [('interface TenGigE0/0/0/8', 'carrier-delay up 100 down 100'), ('interface TenGigE0/0/0/8', 'carrier-delay up 100 down 100'), ('interface TenGigE0/0/0/10', 'carrier-delay up 120000 down 150')]

<强>更新

一个非常强大的正则表达式,用于基于展开循环技术(展开的淬火贪婪令牌)提取相同的文本:

(?sim)^([^\n]*TenGigE[^\n]*\n)[^T\n]*(?:T(?!enGigE)[^T\n]*|\n(?! carrier-delay)[^T\n]*)*(\n carrier-delay[^\n]*)

请参阅regex demo

答案 1 :(得分:0)

你可以提出:

(?:^(interface\ TenGigE
(?:\d+/?){4}))
(?:(?!(?:carrier-delay|interface))[\s\S])+
(?P<carrier>carrier-delay\ .+)

Python中,这将是:

import re
rx = re.compile("""
(?:^(interface\ TenGigE
(?:\d+/?){4}))
(?:(?!(?:carrier-delay|interface))[\s\S])+
(?P<carrier>carrier-delay\ .+)""", re.VERBOSE|re.MULTILINE)
matches = rx.findall(string)

与@ Wiktor的答案(需要> 200k步)相比,这个只需要~3k,参见 a demo on regex101.com (感谢他之前发现的不准确之处)。