正则表达式中的可选匹配

时间:2017-04-12 17:32:41

标签: python regex

尝试将这些输入字符串匹配到三个匹配的组(Regex101 link):

    | input string  | x  | y   | z  |
------------------------------------
  I | a             | a  |     |    |
 II | a - b         | a  | b   |    |
III | a - b-c       | a  | b-c |    |
 IV | a - b, 12     | a  | b   | 12 |
  V | a - 12        | a  |     | 12 |
 VI | 12            |    |     | 12 |

因此输入字符串的解剖结构如下:

  
      
  • 自由文字的可选第一部分,直到hyphen周围的空格(-输入字符串结束
  •   
  • 第一个连字符后带任意字符的可选第二部分,周围有空格,直到comma或输入字符串结束
  •   
  • 最后可选两个数字
  •   

我尝试过多种不同的解决方案,这是我目前的尝试:

^(?P<x>.*)(?:-)(?P<y>.*)(?<!\d)(?P<z>\d{0,2})(?!\d)$

它处理方案IIIVV确定(必须对空格进行一些修整),但是:

  • IVI根本不会返回
  • III不是在第一个连字符处分开,而是在最后一个
  • 处分开

3 个答案:

答案 0 :(得分:5)

这似乎做得相当不错:

^(?:(.*?)(?: - |$))?(?:(.*?)(?:, |$))?(\d\d$)?$

感兴趣的值将分别在第1,2和3组中。

唯一的罪魁祸首是“两位数”将是

    案例V和在第2组中
  • 在案例VI的第1组中,

其他群体在这些情况下是空的。

这是因为“两位数”很乐意与“自由文本匹配,直到分隔符,或字符串结束”规则。

您可以使用负前瞻来强制将两位数字放入最后一组,但除非“两位数”不是第1组和第2组的合法值,否则这将是正确的。在任何情况下,它都会使表达方式变得笨拙:

^(?:((?!\d\d$).*?)(?: - |$))?(?:((?!\d\d$).*?)(?:, |$))?(\d\d$)?$

故障:

^                    # string starts
(?:(.*?)(?: - |$))?  # any text, reluctantly, and " - " or the string ends
(?:(.*?)(?:, |$))?   # any text, reluctantly, and ", " or the string ends
(\d\d$)?             # two digits and the string ends
$                    # string ends

答案 1 :(得分:3)

实现此任务的冗余正则表达式较少,但是这一步以非常简单的方式对逻辑进行编码:

^(?P<x>(?!\d\d$)(?:(?! - ).)*)?(?: - (?P<y>(?!\d\d$)[^,\n]*)?(?:, )?)?(?P<z>\d\d)?$
^                   # assert start of string/line
(?P<x>              # capture in group "x"
    (?!\d\d$)       # if the whole string is just two digits, don't capture them in group x
    (?:             # as long as...
        (?! - )     # ...we don't come across the text " - "...
        .           # ...consume the next character
    )*
)?                  # make group x optional
(?:                 # if possible...
     -              # consume the " - " separator
    (?P<y>          # then capture group "y"
        (?!\d\d$)   # again, only if this isn't two digits which belong in group z
        [^,\n]*     # consume everything up to a comma
    )?              # group y is also optional
    (?:, )?         # consume the ", " separator, if present
)?
(?P<z>              # finally, capture in group "z"...
    \d\d            # ...two digits...
)?                  # ...if present
$                   # assert end of string

答案 2 :(得分:2)

有趣的问题,这是我提出的解决方案:

^
    (?:(?P<x>\D*?)(?=(?:\ -\ |$)))?
    (?:.*?(?<=\ -\ )(?P<y>[^\d,]+)(?=,|$))?
    (?:.*?(?P<z>\d{2}$))?
$

请参阅a demo on regex101.com(并注意verbose [aka x]和multiline [aka m]修饰符):

<小时/> 更详细:

^                       # start of the line
    (?:                 # non capturing parentheses
        (?P<x>\D*?)     # no digits lazily ...
        (?=\ -\ |$)     # up until either " - " or end of string
    )?                  # optional
    (?:
        .*?             # match everything lazily
        (?<=\ -\ )      # pos. lookbehind
        (?P<y>[^\d,]+)  # not a comma or digit
        (?=,|$)         # up until a comma or end of string
    )?
    (?:
        .*?
        (?P<z>\d{2}$)   # two digits at the end
    )?
$