所以我想在这样的字符串中捕获索引:
"Something bad happened! @ data[u'string_1'][u'string_2']['u2'][0]"
我想捕获字符串string_1
,string_2
,u2
和0
。
我能够使用以下正则表达式执行此操作:
re.findall("("
"((?<=\[u')|(?<=\['))" # Begins with [u' or ['
"[a-zA-Z0-9_\-]+" # Followed by any letters, numbers, _'s, or -'s
"(?='\])" # Ending with ']
")"
"|" # OR
"("
"(?<=\[)" # Begins with [
"[0-9]+" # Followed by any numbers
"(?=\])" # Endging with ]
")", message)
问题是结果将包含空字符串的元组,如下:
[('string_1', '', ''), ('string_2', '', ''), ('u2', '', ''), ('', '', '0')]
现在我可以轻松地从结果中过滤掉空字符串,但我想首先防止它们出现。
我认为这是因为我的捕获组。我尝试在这些组中使用?:
,但后来我的结果完全消失了。
这就是我试图这样做的方式:
re.findall("(?:"
"((?<=\[u')|(?<=\['))" # Begins with [u' or ['
"[a-zA-Z0-9_\-]+" # Followed by any letters, numbers, _'s, or -'s
"(?='\])" # Ending with ']
")"
"|" # OR
"(?:"
"(?<=\[)" # Begins with [
"[0-9]+" # Followed by any numbers
"(?=\])" # Endging with ]
")", message)
这导致以下输出:
['', '', '', '']
我假设问题是由于我使用了lookbehinds以及非捕获组。关于这是否可以用Python做的任何想法?
由于
答案 0 :(得分:1)
正则表达式:(?<=\[)(?:[^'\]]*')?([^'\]]+)
或\[(?:[^'\]]*')?([^'\]]+)
Python代码:
def Years(text):
return re.findall(r'(?<=\[)(?:[^\'\]]*\')?([^\'\]]+)', text)
print(Years('Something bad happened! @ data[u\'string_1\'][u\'string_2\'][\'u2\'][0]'))
输出:
['string_1', 'string_2', 'u2', '0']
答案 1 :(得分:1)