python中的正则表达式 - 找到一组不间断的数字

时间:2015-12-21 16:21:28

标签: python regex

我写了以下代码:

import re

strings = []

strings.append('merchant ID 1234, device ID 45678, serial# 123456789')
strings.append('merchant ID 8765, user ID 531476, serial# 87654321')
strings.append('merchant ID 1234, device ID 4567, serial# 123456789')
strings.append('merchant ID 1234#56, device ID 45678, serial# 123456789')
strings.append('device ID 4567, merchant ID 8765, user ID 531476, serial# 87654321')




for n in strings:
    expr = re.findall(r'merchant\sID\s\d+|device\sID\s\d+', n);
    if len(expr) == 2:
        print(n)

任务是扫描5个字符串并仅打印带有“商家ID”和“设备ID”的字符串,并且ID号是legid(仅限降级)。因此,从这5个字符串中,它应该只打印第一个,第三个和第五个字符串。 我写的代码也打印出第四个字符串。

如何修复代码以识别1234#56的数字集不合法?

3 个答案:

答案 0 :(得分:1)

以下是针对您的具体案例的示例:您可以使用merchant\sID\s\d+

替换正则表达式中的merchant\sID\s\d+(?=[\s,$])

解释:(?=[\s,$])新添加的部分指定了“后跟空格,逗号或字符串结尾”的前瞻断言。另请参阅:https://docs.python.org/2/library/re.html(搜索“lookahead assertion”)

如果您需要通用解决方案,我担心您首先需要提供更多详细信息,例如:你如何定义“不间断”。

答案 1 :(得分:1)

您可以使用lookaround assertions指定哪些字符可以在数字之前/之后,也可以不在数字之前。

您还可以使用环视来确保两个ID都以任何顺序匹配:

In [9]: for n in strings:
   ...:     print(re.findall(r'(?=.*merchant\sID\s(\d+)\b(?!#)).*device\sID\s(\d+)\b(?!#)
   ...:
[('1234', '45678')]
[]
[('1234', '4567')]
[]
[('8765', '4567')]

测试live on regex101.com

<强>解释

(?=             # Assert that the following can be matched:
 .*             # Any number of characters
 merchant\sID\s # followed by "merchant ID "
 (\d+)          # and a number (put that in group 1)
 \b(?![#])      # but only if that number isn't followed by #
)               # End of lookahead
.*              # Then match the actual string, any number of characters,
device\sID\s    # followed by "device ID "
(\d+)           # and a number (put that in group 2)
\b(?![#])       # but only if that number isn't followed by #

答案 2 :(得分:1)

您可以在此处使用re.match来查找以特定模式开头的字符串:

>>> for s in strings:
...     if re.match('[^s]+ ID \d+, [^s]+ ID \d+,', s):
...         print(s)
... 
merchant ID 1234, device ID 45678, serial# 123456789
merchant ID 1234, device ID 4567, serial# 123456789
device ID 4567, merchant ID 8765, user ID 531476, serial# 87654321

演示并解释模式:https://regex101.com/r/qA9pY7/1
我在此处添加了^来模拟re.match的行为。