Question

我只使用正则表达式提取前面的所有数字（\）：

filer = 'in this \002eld has established some theoretical guidelines. 
Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'

我尝试了以下内容：

ss = set(re.findall(r'\b\d+\b', filer))
print ss

但是所有的号码都被退回了。

输出： set（['24'，'1'，'3'，'5']）

并注意未返回所需的数字

Answer 1

你可以像这样尝试后视

(?<=\\)\d+

Answer 2

你可以做到：

>>> import re
>>> filer = 'in this \002eld has established some theoretical guidelines. Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'
>>> re.findall(r"\\\d+", filer)
['\\002', '\\002']

你的正则表达式错误，因为它匹配字边界内的所有数字，即：

>>> s r'matches: 123 \456 789 ; but not: \321aoeu ao654 ao\987oa'
>>> re.findall(r'\b\d+\b', s)
['123', '456', '789']

因此，在您的正则表达式中，由于右侧的额外字符，它不会匹配\002eld或\002ts。但它也会匹配\数字，如果是：

'in this \002 eld has established some theoretical guidelines. 
Besides such immediate bene\002 ts of lower costs 24 [1], [3], [5].'

看一下下面的表示（点击播放），这样你就可以更好地理解为什么它只匹配最后的数字：

Answer 3

首先，您需要将输入定义为原始字符串，否则字符串中的\002将转换为其他unicode字符。

>>> filer = r'in this \002eld has established some theoretical guidelines. Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'
>>> re.findall(r'\\(\d+)', filer)
['002', '002']
>>> filer = 'in this \002eld has established some theoretical guidelines. Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'
>>> re.findall(r'\\(\d+)', filer)
[]

仅通过斜杠（\）提取前面的数字

3 个答案: