尝试将这些输入字符串匹配到三个匹配的组(Regex101 link):
| input string | x | y | z |
------------------------------------
I | a | a | | |
II | a - b | a | b | |
III | a - b-c | a | b-c | |
IV | a - b, 12 | a | b | 12 |
V | a - 12 | a | | 12 |
VI | 12 | | | 12 |
因此输入字符串的解剖结构如下:
- 自由文字的可选第一部分,直到
hyphen
周围的空格(-
)或输入字符串结束- 第一个连字符后带任意字符的可选第二部分,周围有空格,直到
comma
或输入字符串结束- 最后可选两个数字
我尝试过多种不同的解决方案,这是我目前的尝试:
^(?P<x>.*)(?:-)(?P<y>.*)(?<!\d)(?P<z>\d{0,2})(?!\d)$
它处理方案II
,IV
和V
确定(必须对空格进行一些修整),但是:
I
和VI
根本不会返回III
不是在第一个连字符处分开,而是在最后一个答案 0 :(得分:5)
这似乎做得相当不错:
^(?:(.*?)(?: - |$))?(?:(.*?)(?:, |$))?(\d\d$)?$
感兴趣的值将分别在第1,2和3组中。
唯一的罪魁祸首是“两位数”将是
其他群体在这些情况下是空的。
这是因为“两位数”很乐意与“自由文本匹配,直到分隔符,或字符串结束”规则。
您可以使用负前瞻来强制将两位数字放入最后一组,但除非“两位数”不是第1组和第2组的合法值,否则这将是正确的。在任何情况下,它都会使表达方式变得笨拙:
^(?:((?!\d\d$).*?)(?: - |$))?(?:((?!\d\d$).*?)(?:, |$))?(\d\d$)?$
故障:
^ # string starts (?:(.*?)(?: - |$))? # any text, reluctantly, and " - " or the string ends (?:(.*?)(?:, |$))? # any text, reluctantly, and ", " or the string ends (\d\d$)? # two digits and the string ends $ # string ends
答案 1 :(得分:3)
实现此任务的冗余正则表达式较少,但是这一步以非常简单的方式对逻辑进行编码:
^(?P<x>(?!\d\d$)(?:(?! - ).)*)?(?: - (?P<y>(?!\d\d$)[^,\n]*)?(?:, )?)?(?P<z>\d\d)?$
^ # assert start of string/line
(?P<x> # capture in group "x"
(?!\d\d$) # if the whole string is just two digits, don't capture them in group x
(?: # as long as...
(?! - ) # ...we don't come across the text " - "...
. # ...consume the next character
)*
)? # make group x optional
(?: # if possible...
- # consume the " - " separator
(?P<y> # then capture group "y"
(?!\d\d$) # again, only if this isn't two digits which belong in group z
[^,\n]* # consume everything up to a comma
)? # group y is also optional
(?:, )? # consume the ", " separator, if present
)?
(?P<z> # finally, capture in group "z"...
\d\d # ...two digits...
)? # ...if present
$ # assert end of string
答案 2 :(得分:2)
有趣的问题,这是我提出的解决方案:
^
(?:(?P<x>\D*?)(?=(?:\ -\ |$)))?
(?:.*?(?<=\ -\ )(?P<y>[^\d,]+)(?=,|$))?
(?:.*?(?P<z>\d{2}$))?
$
请参阅a demo on regex101.com(并注意verbose
[aka x]和multiline
[aka m]修饰符):
^ # start of the line
(?: # non capturing parentheses
(?P<x>\D*?) # no digits lazily ...
(?=\ -\ |$) # up until either " - " or end of string
)? # optional
(?:
.*? # match everything lazily
(?<=\ -\ ) # pos. lookbehind
(?P<y>[^\d,]+) # not a comma or digit
(?=,|$) # up until a comma or end of string
)?
(?:
.*?
(?P<z>\d{2}$) # two digits at the end
)?
$