正则表达式:正确匹配具有负面回顾的组

时间:2017-01-11 21:26:00

标签: python regex

我正在使用此字符串:

qr/I Love Chocolate|And Free Shipping|All (day|night)|please/i;

我正在使用以下正则表达式模式:

(?:qr\/)?(.*?)(?:\||\/)

我希望得到以下比赛:

["I Love Chocolate", "And Free Shipping", "All (day|night)", "please"]

但是,这才是我实际得到的:

["I Love Chocolate", "And Free Shipping", "All (day", "night)", "please"]

我修改了我的正则表达式以使用回顾:

(?:qr\/)?(?<!All \(day|night\))(.*?)(?:\||\/)

但是,这仍会将字符串拆分为All (daynight)

如何调整正则表达式,以便不是将All (daynight)作为单个字符串捕获,而是获取All (day|night)

更一般地说,麻瓜说法的目标是:“查找由管道字符分隔的任何组,除非该组包含由椭圆包围的1个或多个管道字符;在这种情况下,将整个字符串视为一个组。“

2 个答案:

答案 0 :(得分:3)

您可以使用此正则表达式进行匹配:

[^/|(]+(?:\([^)]*\))*

<强>代码:

>>> str = 'qr/I Love Chocolate|And Free Shipping|All (day|night)|please/i'
>>> print re.findall(r'[^/|(]+(?:\([^)]*\))*', str)
['qr', 'I Love Chocolate', 'And Free Shipping', 'All (day|night)', 'please', 'i']

或者,如果您想在开始时放弃qr/,最后放弃/i,请使用:

>>> print re.findall(r'[^/|(]+(?:\([^)]*\))*', re.sub(r'^qr/(.*)/i$', r'\1', str))
['I Love Chocolate', 'And Free Shipping', 'All (day|night)', 'please']

RegEx Demo

答案 1 :(得分:2)

如果day周围只有night|个字,您可以使用负向后观和负向前瞻

>>> re.split(r"(?<!day)\|(?!night)", s)
['qr/I Love Chocolate', 'And Free Shipping', 'All (day|night)', 'please/i;']

我还预先删除了qr/前缀和/i后缀,以保持拆分表达式的简单性。例如,这样:

>>> s = "qr/I Love Chocolate|And Free Shipping|All (day|night)|please/i;"
>>> s = re.sub(r"^[a-z]+/", "", s)
>>> s = re.sub(r"/[a-z]+;$", "", s)

然后,拆分:

>>> re.split(r"(?<!day)\|(?!night)", s)
['I Love Chocolate', 'And Free Shipping', 'All (day|night)', 'please']