我只使用正则表达式提取前面的所有数字(\
):
filer = 'in this \002eld has established some theoretical guidelines.
Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'
我尝试了以下内容:
ss = set(re.findall(r'\b\d+\b', filer))
print ss
但是所有的号码都被退回了。
输出: set(['24','1','3','5'])
并注意未返回所需的数字
答案 0 :(得分:1)
你可以像这样尝试后视
(?<=\\)\d+
答案 1 :(得分:0)
>>> import re
>>> filer = 'in this \002eld has established some theoretical guidelines. Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'
>>> re.findall(r"\\\d+", filer)
['\\002', '\\002']
你的正则表达式错误,因为它匹配字边界内的所有数字,即:
>>> s r'matches: 123 \456 789 ; but not: \321aoeu ao654 ao\987oa'
>>> re.findall(r'\b\d+\b', s)
['123', '456', '789']
因此,在您的正则表达式中,由于右侧的额外字符,它不会匹配\002eld
或\002ts
。但它也会匹配\
数字,如果是:
'in this \002 eld has established some theoretical guidelines.
Besides such immediate bene\002 ts of lower costs 24 [1], [3], [5].'
看一下下面的表示(点击播放),这样你就可以更好地理解为什么它只匹配最后的数字:
答案 2 :(得分:0)
首先,您需要将输入定义为原始字符串,否则字符串中的\002
将转换为其他unicode字符。
>>> filer = r'in this \002eld has established some theoretical guidelines. Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'
>>> re.findall(r'\\(\d+)', filer)
['002', '002']
>>> filer = 'in this \002eld has established some theoretical guidelines. Besides such immediate bene\002ts of lower costs 24 [1], [3], [5].'
>>> re.findall(r'\\(\d+)', filer)
[]