具有lookbehind和lookahead的非捕获括号 - Python

时间:2018-02-06 21:22:15

标签: python regex regex-lookarounds

所以我想在这样的字符串中捕获索引:

 "Something bad happened! @ data[u'string_1'][u'string_2']['u2'][0]"

我想捕获字符串string_1string_2u20

我能够使用以下正则表达式执行此操作:

re.findall("("
           "((?<=\[u')|(?<=\['))" # Begins with [u' or ['
           "[a-zA-Z0-9_\-]+" # Followed by any letters, numbers, _'s, or -'s
           "(?='\])" # Ending with ']
           ")"
           "|" # OR
           "("
           "(?<=\[)" # Begins with [
           "[0-9]+" # Followed by any numbers
           "(?=\])" # Endging with ]
           ")", message)

问题是结果将包含空字符串的元组,如下:

[('string_1', '', ''), ('string_2', '', ''), ('u2', '', ''), ('', '', '0')]

现在我可以轻松地从结果中过滤掉空字符串,但我想首先防止它们出现。

我认为这是因为我的捕获组。我尝试在这些组中使用?:,但后来我的结果完全消失了。

这就是我试图这样做的方式:

re.findall("(?:"
           "((?<=\[u')|(?<=\['))" # Begins with [u' or ['
           "[a-zA-Z0-9_\-]+" # Followed by any letters, numbers, _'s, or -'s
           "(?='\])" # Ending with ']
           ")"
           "|" # OR
           "(?:"
           "(?<=\[)" # Begins with [
           "[0-9]+" # Followed by any numbers
           "(?=\])" # Endging with ]
           ")", message)

这导致以下输出:

['', '', '', '']

我假设问题是由于我使用了lookbehinds以及非捕获组。关于这是否可以用Python做的任何想法?

由于

2 个答案:

答案 0 :(得分:1)

正则表达式(?<=\[)(?:[^'\]]*')?([^'\]]+)\[(?:[^'\]]*')?([^'\]]+)

Python代码

def Years(text):
        return re.findall(r'(?<=\[)(?:[^\'\]]*\')?([^\'\]]+)', text)

print(Years('Something bad happened! @ data[u\'string_1\'][u\'string_2\'][\'u2\'][0]'))

输出:

['string_1', 'string_2', 'u2', '0']

答案 1 :(得分:1)

您可以简化正则表达式。

(?<=\[)u?'?([a-zA-Z0-9_\-]+)(?='?\])

参见演示。

https://regex101.com/r/SA6shx/1