我有以下正则表达式(使用Python语法):
(\d+)x(\d+)(?:\s+)?-(?:\s+)?([^\(\)]+)(?:\s+)?\((\d+)(?:(?:\s+)?-(?:\s+)?([^\(\)]+))?\)(?:(?:\s+)?\(([^\(\)]+)\))?(?:(?:\s+)?-(?:\s+)?([^\(\)]+) \((\d+)\))?
它匹配符合以下形式之一的字符串:
21x04 - Some Text (04)
6x03 - Some Text (00 - Some Text)
6x03 - Some Text (00 - Some Text) (Some Text)
23x01 - Some Text (10) - Some Text (02)
数字和文字各不相同,并被捕获。但是,间距并不总是一致的,因此它被设计为允许任意数量的空格。
有没有办法简化它 - 我不一定要求有人为我这样做,只是告诉我是否有工具(谷歌搜索产生了一些结果,但没有一个能够处理它) ,或系统的方法。
或者任何人都可以看到适合这种情况的更好的正则表达式吗?
答案 0 :(得分:1)
您可以放弃一些可选的非捕获组,例如,您可以更改它:
(\d+)x(\d+)(?:\s+)?-(?:\s+)?([^\(\)]+)(?:\s+)?\((\d+)(?:(?:\s+)?-(?:\s+)?([^\(\)]+))?\)(?:(?:\s+)?\(([^\(\)]+)\))?(?:(?:\s+)?-(?:\s+)?([^\(\)]+) \((\d+)\))?
对此:
(\d+)x(\d+)\W+([^()]+)\D+\((\d+)(?:\W*-\W*([^()]+))?\)(?:\W*\(([^()]+)\))?(?:\W*-\W*([^()]+) \((\d+)\))?
<强> Working demo 强>
我可以用(?:\s+)?
替换一些\W*
,并且您也不必在[^\(\)]
中使用[^()]
(\d+)x(\d+)|-\s*([\w\s]+)|(\w+)
中转义括号
顺便说一句,你也可以测试这个正则表达式,它可能对你有用:
{{1}}
<强> Working demo 强>
答案 1 :(得分:0)
为了简化问题,请考虑将其分为两部分:1。获取字符串(可以包含数字或字母)和2.在字符串包含数字时获取数字:
data = '''21x04 - Some Text (04)
6x03 - Some Text (00 - Some Text)
6x03 - Some Text (00 - Some Text) (Some Text)
23x01 - Some Text (10) - Some Text (02)'''
import re
# the regex to extract your data as strings
aaa = re.compile('[\w\s]+')
# the regex to extract the numbers from the strings
nnn = re.compile('\d+')
for line in data.split('\n'):
matches = aaa.findall(line)
groups = []
for m in matches:
m = m.strip()
n = nnn.findall(m)
if m != '':
groups.extend([m] if n == [] else n)
print(groups)
# ['21', '04', 'Some Text', '04']
# ['6', '03', 'Some Text', '00', 'Some Text']
# ['6', '03', 'Some Text', '00', 'Some Text', 'Some Text']
# ['23', '01', 'Some Text', '10', 'Some Text', '02']