Question

我试图匹配字符串中的多个子字符串。

感兴趣的领域的格式为：

Sample1: "text text text[One]"
Sample2:"text text text[One/Two]"
Sample3:"text text text[One/Two/Three]"

我试图通过以下方式使用正则表达式获取数字列表：

numbers = re.findall('(\[|\/)(\w+)(\/|\])', str)

然而，group2产生：

#Sample1
['One']
#Sample2
['One']
#Sample3
['One','Three']

无论如何，我都无法匹配＆＃39; /＆＃39;之间的第二个数字。并且＆＃39;]＆＃39;或者＆＃39; /＆＃39;。但是，我不明白为什么它不匹配＆＃39; / 2 /＆＃39;作为＆＃39; /＆＃39;字符是两种选择中的一种选择。

我还尝试使用以下正则表达式以不同的方式构建它：

re.findall('[\[]?[\/]?(\w+)[\/]?[\]]?', str)

虽然它给了我想要的结果，但它也给了我前面文字中的所有单词。

任何建议表示赞赏。

Answer 1

你可以试试这个：

s = ["text text text[One]", "text text text[One/Two]",  "text text text[One/Two/Three]"]
import re
final_data = [[b.split('/') for b in re.findall('\[(.*?)\]', i)][0] for i in s]

输出：

[['One'], ['One', 'Two'], ['One', 'Two', 'Three']]

Answer 2

使用lookbehind和lookahead，以便[，/和]不属于匹配项：

>>> [re.findall('(?<=\[|\/)\w+(?=\/|\])', s) for s in samples]
[['One'], ['One', 'Two'], ['One', 'Two', 'Three']]

这样，中间/可用于两场比赛。

Answer 3

你也可以试试这个正则表达式：

import re
regex = r"\[.+?\]"
Sample1= "text text text[One]"
Sample2= "text text text[One/Two]"
Sample3= "text text text[One/Two/Three]"
lines=[Sample1,Sample2,Sample3]
subres = [re.findall(r"\[(.+[^\/])\]", s) for s in lines]
result = [res[0].split('/') for res in subres]

print(result)

结果：

[[＆＃39; One＆＃39;]，[＆＃39; One＆＃39;，＆＃39; Two＆＃39;]，[＆＃39; One＆＃39;，＆＃39; Two＆＃39;，＆＃39;三＆＃39;]]

Answer 4

如果您确定目标字符串始终采用您已显示的格式，那么为什么不首先找到所有数字用斜杠分隔，然后拆分结果/？

Sample3 = "text text text[One/Two/Three]"
re.findall('\[(.*)\]', Sample3)[0].split('/')

输出：

['One', 'Two', 'Three']

Python Regex布尔＆＃39;或＆＃39;不会选择所有匹配

4 个答案: