RegEx:查找特定字符串后的所有数字

时间:2016-02-10 08:43:30

标签: python regex

我试图在classes或其变体)之后的后续字符串中获取所有数字

Accepted for all the goods and services in classes 16 and 41.

预期产出:

16
41

我有多个字符串遵循此模式,其他一些字符串如下:

classes 5 et 30 # expected output 5, 30
class(es) 32,33 # expected output 32, 33
class 16        # expected output 5

以下是我到目前为止所尝试的内容:https://regex101.com/r/eU7dF6/3

(class[\(es\)]*)([and|et|,|\s]*(\d{1,}))+

但我只能在上面的例子中得到最后一个匹配的数字,即41

2 个答案:

答案 0 :(得分:1)

你可以分两步完成.Regex引擎只记录连续组中的最后一组。

x="""Accepted for all the goods and services in classes 16 and 41."""
print re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0])

输出:['16', '41']

如果您不想string使用

print map(ast.literal_eval,re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0]))

输出:[16, 41]

如果必须在一个正则表达式中使用regex模块

import regex
x="""Accepted for all the goods and services in classes 16 and 41."""
print [ast.literal_eval(i) for i in regex.findall(r"class[\(es\)]*|\G(?:and|et|,|\s)*(\d+)",x,regex.VERSION1) if i]

输出:[16, 41]

答案 1 :(得分:1)

我建议在classclasses / class(es)后使用数字抓取所有子字符串,然后从中获取所有数字:

import re
p = re.compile(r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*\d+)+')
test_str = "Accepted for all the goods and services in classes 16 and 41."
results = [re.findall(r"\d+", x) for x in p.findall(test_str)]
print([x for l in results for x in l])
# => ['16', '41']

请参阅IDEONE demo

由于不支持\G构造,也不能使用Python re模块访问捕获堆栈,因此无法使用您的方法。

但是,您可以按照PyPi regex module的方式进行操作。

>>> import regex
>>> test_str = "Accepted for all the goods and services in classes 16 and 41."
>>> rx = r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*(?P<num>\d+))+'
>>> res = []
>>> for x in regex.finditer(rx, test_str):
        res.extend(x.captures("num"))
>>> print res
['16', '41']