仅通过斜杠(\)提取前面的数字

时间:2015-03-09 00:35:59

标签: regex python-2.7

我只使用正则表达式提取前面的所有数字(\):

filer = 'in this \002eld has established some theoretical guidelines. 
Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'

我尝试了以下内容:

ss = set(re.findall(r'\b\d+\b', filer))
print ss

但是所有的号码都被退回了。

输出:     set(['24','1','3','5'])

并注意未返回所需的数字

3 个答案:

答案 0 :(得分:1)

你可以像这样尝试后视

(?<=\\)\d+

答案 1 :(得分:0)

你可以做到:

>>> import re
>>> filer = 'in this \002eld has established some theoretical guidelines. Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'
>>> re.findall(r"\\\d+", filer)
['\\002', '\\002']

Regular expression visualization

你的正则表达式错误,因为它匹配字边界内的所有数字,即:

>>> s r'matches: 123 \456 789 ; but not: \321aoeu ao654 ao\987oa'
>>> re.findall(r'\b\d+\b', s)
['123', '456', '789']

因此,在您的正则表达式中,由于右侧的额外字符,它不会匹配\002eld\002ts。但它也会匹配\数字,如果是:

'in this \002 eld has established some theoretical guidelines. 
Besides such immediate bene\002 ts of lower costs 24 [1], [3], [5].'

看一下下面的表示(点击播放),这样你就可以更好地理解为什么它只匹配最后的数字:

Regular expression visualization

答案 2 :(得分:0)

首先,您需要将输入定义为原始字符串,否则字符串中的\002将转换为其他unicode字符。

>>> filer = r'in this \002eld has established some theoretical guidelines. Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'
>>> re.findall(r'\\(\d+)', filer)
['002', '002']
>>> filer = 'in this \002eld has established some theoretical guidelines. Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'
>>> re.findall(r'\\(\d+)', filer)
[]