Question

要使用正则表达式在字符串中提取长度大于2的任何数字，但还要排除“ 2016”，这是我拥有的：

import re

string = "Employee ID DF856, Year 2016, Department Finance, Team 2, Location 112 "

print re.findall(r'\d{3,}', string)

输出：

['856', '2016', '112']

我尝试将其更改为以下内容以排除“ 2016”，但均失败了。

print re.findall(r'\d{3,}/^(!2016)/', string)
print re.findall(r"\d{3,}/?!2016/", string)
print re.findall(r"\d{3,}!'2016'", string)

正确的方法是什么？谢谢。

问题已扩展，请参阅WiktorStribiżew的最终评论以进行更新。

Answer 1

您要使用否定的前瞻。正确的语法是：

\D(?!2016)(\d{3,})\b

结果：

In [24]: re.findall(r'\D(?!2016)(\d{3,})\b', string)
Out[24]: ['856', '112']

或在后面使用否定字眼：

In [26]: re.findall(r'\D(\d{3,})(?<!2016)\b', string)
Out[26]: ['856', '112']

Answer 2

您可以使用

import re
s = "Employee ID DF856, Year 2016, Department Finance, Team 2, Location 112 20161 12016 120162"
print(re.findall(r'(?<!\d)(?!2016(?!\d))\d{3,}', s))

请参见Python demo和regex demo。

详细信息

(?<!\d)-当前位置的左侧不允许有数字
(?!2016(?!\d))-当前位置右侧不允许有任何2016后面没有数字的地方
\d{3,}-3位或更多数字。

带有一些代码的替代解决方案：

import re
s = "Employee ID DF856, Year 2016, Department Finance, Team 2, Location 112 20161 12016 120162"
print([x for x in re.findall(r'\d{3,}', s) if x != "2016"])

在这里，我们提取3个或更多数字（re.findall(r'\d{3,}', s)）的任何块，然后过滤掉等于2016的那些。

Answer 3

另一种方法可以是：

st="Employee ID DF856, Year 2016, Department Finance, Team 2, Location 112 "
re.findall(r"\d{3,}",re.sub("((2)?(016))","",st))

输出将是：

['856', '112']

但我认为可接受的答案是比我的建议更快的方法。

Python，正则表达式可排除数字匹配项

3 个答案: