Question

我有一个字符串，它类似于：1A2B3C或2B3C或1A2B或1A3C。

该字符串由number + [A|B|C]的serval可选部分组成。

是否可以使用一个正则表达式获取每个字符之前的数字？

例如：

1A2B3C => (1, 2, 3)
1A3C => (1, 0, 3) There is no 'B', so gives 0 instead. 
     => Or just (1, 3) but should show that the 3 is in front of 'C'.

Answer 1

假设使用Python是因为您使用了元组符号，因为这就是我的使用习惯。

如果唯一允许的字母是A，B和C，则可以通过额外的处理步骤来完成：

pattern = re.compile(r'(?:(\d+)A)(?:(\d+)B)?(?:(\d+)C)?')
match = pattern.fullmatch(some_string)
if match:
    result = tuple(int(g) for g in match.groups('0'))
else:
    raise ValueError('Bad input string')

每个选项都由一个非捕获组(?:...)包围，因此整个事情将被视为一个整体。在装置内部，有一个捕获组(\d+)用于捕获数字，还有一个未捕获的字符。

方法Matcher.groups返回正则表达式中所有组的元组，其中不匹配的组设置为'0'。然后，生成器会为您转换为int。您可以使用tuple(map(int, match.groups('0')))。

您还可以使用字典来保存数字，并按字符键入：

pattern = re.compile(r'(?:(?P<A>\d+)A)(?:(?P<B>\d+)B)?(?:(?P<C>\d+)C)?')
match = pattern.fullmatch(some_string)
if match:
    result = {k: int(v) for k, v in match.groupdict('0').items()}
else:
    raise ValueError('Bad input string')

Matcher.groupdict与groups相似，除了它返回命名组的字典：标记为(?P<NAME>...)的捕获组。

最后，如果您不介意拥有字典，则可以采用这种方法来解析任意数量的具有任意字符的组：

pattern = re.compile(r'(\d+)([A-Z])')
result = {}
while some_string:
    match = pattern.match(some_string)
    if not match:
        raise ValueError('Bad input string')
    result[match.group(2)] = int(match.group(1))
    some_string = some_string[match.end():]

正则表达式：是否可以通过一个正则表达式获取可选部分中的数字

1 个答案: