Question

我写了以下代码：

import re

strings = []

strings.append('merchant ID 1234, device ID 45678, serial# 123456789')
strings.append('merchant ID 8765, user ID 531476, serial# 87654321')
strings.append('merchant ID 1234, device ID 4567, serial# 123456789')
strings.append('merchant ID 1234#56, device ID 45678, serial# 123456789')
strings.append('device ID 4567, merchant ID 8765, user ID 531476, serial# 87654321')




for n in strings:
    expr = re.findall(r'merchant\sID\s\d+|device\sID\s\d+', n);
    if len(expr) == 2:
        print(n)

任务是扫描5个字符串并仅打印带有“商家ID”和“设备ID”的字符串，并且ID号是legid（仅限降级）。因此，从这5个字符串中，它应该只打印第一个，第三个和第五个字符串。我写的代码也打印出第四个字符串。

如何修复代码以识别1234＃56的数字集不合法？

Answer 1

以下是针对您的具体案例的示例：您可以使用merchant\sID\s\d+

替换正则表达式中的merchant\sID\s\d+(?=[\s,$])

解释：(?=[\s,$])新添加的部分指定了“后跟空格，逗号或字符串结尾”的前瞻断言。另请参阅：https://docs.python.org/2/library/re.html（搜索“lookahead assertion”）

如果您需要通用解决方案，我担心您首先需要提供更多详细信息，例如：你如何定义“不间断”。

Answer 2

您可以使用lookaround assertions指定哪些字符可以在数字之前/之后，也可以不在数字之前。

您还可以使用环视来确保两个ID都以任何顺序匹配：

In [9]: for n in strings:
   ...:     print(re.findall(r'(?=.*merchant\sID\s(\d+)\b(?!#)).*device\sID\s(\d+)\b(?!#)
   ...:
[('1234', '45678')]
[]
[('1234', '4567')]
[]
[('8765', '4567')]

测试live on regex101.com。

<强>解释

(?=             # Assert that the following can be matched:
 .*             # Any number of characters
 merchant\sID\s # followed by "merchant ID "
 (\d+)          # and a number (put that in group 1)
 \b(?![#])      # but only if that number isn't followed by #
)               # End of lookahead
.*              # Then match the actual string, any number of characters,
device\sID\s    # followed by "device ID "
(\d+)           # and a number (put that in group 2)
\b(?![#])       # but only if that number isn't followed by #

Answer 3

您可以在此处使用re.match来查找以特定模式开头的字符串：

>>> for s in strings:
...     if re.match('[^s]+ ID \d+, [^s]+ ID \d+,', s):
...         print(s)
... 
merchant ID 1234, device ID 45678, serial# 123456789
merchant ID 1234, device ID 4567, serial# 123456789
device ID 4567, merchant ID 8765, user ID 531476, serial# 87654321

演示并解释模式：https://regex101.com/r/qA9pY7/1
我在此处添加了^来模拟re.match的行为。

python中的正则表达式 - 找到一组不间断的数字

3 个答案: