简化Regexp

时间:2015-04-08 22:31:14

标签: python regex simplification

我有以下正则表达式(使用Python语法):

(\d+)x(\d+)(?:\s+)?-(?:\s+)?([^\(\)]+)(?:\s+)?\((\d+)(?:(?:\s+)?-(?:\s+)?([^\(\)]+))?\)(?:(?:\s+)?\(([^\(\)]+)\))?(?:(?:\s+)?-(?:\s+)?([^\(\)]+) \((\d+)\))?

它匹配符合以下形式之一的字符串:

21x04 - Some Text (04)
6x03 - Some Text (00 - Some Text)
6x03 - Some Text (00 - Some Text) (Some Text)
23x01 - Some Text (10) - Some Text (02)

数字和文字各不相同,并被捕获。但是,间距并不总是一致的,因此它被设计为允许任意数量的空格。

有没有办法简化它 - 我不一定要求有人为我这样做,只是告诉我是否有工具(谷歌搜索产生了一些结果,但没有一个能够处理它) ,或系统的方法。

或者任何人都可以看到适合这种情况的更好的正则表达式吗?

2 个答案:

答案 0 :(得分:1)

您可以放弃一些可选的非捕获组,例如,您可以更改它:

(\d+)x(\d+)(?:\s+)?-(?:\s+)?([^\(\)]+)(?:\s+)?\((\d+)(?:(?:\s+)?-(?:\s+)?([^\(\)]+))?\)(?:(?:\s+)?\(([^\(\)]+)\))?(?:(?:\s+)?-(?:\s+)?([^\(\)]+) \((\d+)\))?

对此:

(\d+)x(\d+)\W+([^()]+)\D+\((\d+)(?:\W*-\W*([^()]+))?\)(?:\W*\(([^()]+)\))?(?:\W*-\W*([^()]+) \((\d+)\))?

<强> Working demo

我可以用(?:\s+)?替换一些\W*,并且您也不必在[^\(\)]中使用[^()]

的字符类(\d+)x(\d+)|-\s*([\w\s]+)|(\w+) 中转义括号

顺便说一句,你也可以测试这个正则表达式,它可能对你有用:

{{1}}

<强> Working demo

答案 1 :(得分:0)

为了简化问题,请考虑将其分为两部分:1。获取字符串(可以包含数字或字母)和2.在字符串包含数字时获取数字:

data = '''21x04 - Some Text (04)
6x03 - Some Text (00 - Some Text)
6x03 - Some Text (00 - Some Text) (Some Text)
23x01 - Some Text (10) - Some Text (02)'''

import re

# the regex to extract your data as strings
aaa = re.compile('[\w\s]+')

# the regex to extract the numbers from the strings
nnn = re.compile('\d+')

for line in data.split('\n'):
    matches = aaa.findall(line)
    groups = []
    for m in matches:
        m = m.strip()
        n = nnn.findall(m)
        if m != '':
            groups.extend([m] if n == [] else n)
    print(groups)

    # ['21', '04', 'Some Text', '04']
    # ['6', '03', 'Some Text', '00', 'Some Text']
    # ['6', '03', 'Some Text', '00', 'Some Text', 'Some Text']
    # ['23', '01', 'Some Text', '10', 'Some Text', '02']