我写了以下代码:
import re
strings = []
strings.append('merchant ID 1234, device ID 45678, serial# 123456789')
strings.append('merchant ID 8765, user ID 531476, serial# 87654321')
strings.append('merchant ID 1234, device ID 4567, serial# 123456789')
strings.append('merchant ID 1234#56, device ID 45678, serial# 123456789')
strings.append('device ID 4567, merchant ID 8765, user ID 531476, serial# 87654321')
for n in strings:
expr = re.findall(r'merchant\sID\s\d+|device\sID\s\d+', n);
if len(expr) == 2:
print(n)
任务是扫描5个字符串并仅打印带有“商家ID”和“设备ID”的字符串,并且ID号是legid(仅限降级)。因此,从这5个字符串中,它应该只打印第一个,第三个和第五个字符串。 我写的代码也打印出第四个字符串。
如何修复代码以识别1234#56的数字集不合法?
答案 0 :(得分:1)
以下是针对您的具体案例的示例:您可以使用merchant\sID\s\d+
merchant\sID\s\d+(?=[\s,$])
解释:(?=[\s,$])
新添加的部分指定了“后跟空格,逗号或字符串结尾”的前瞻断言。另请参阅:https://docs.python.org/2/library/re.html(搜索“lookahead assertion”)
如果您需要通用解决方案,我担心您首先需要提供更多详细信息,例如:你如何定义“不间断”。
答案 1 :(得分:1)
您可以使用lookaround assertions指定哪些字符可以在数字之前/之后,也可以不在数字之前。
您还可以使用环视来确保两个ID都以任何顺序匹配:
In [9]: for n in strings:
...: print(re.findall(r'(?=.*merchant\sID\s(\d+)\b(?!#)).*device\sID\s(\d+)\b(?!#)
...:
[('1234', '45678')]
[]
[('1234', '4567')]
[]
[('8765', '4567')]
<强>解释强>
(?= # Assert that the following can be matched:
.* # Any number of characters
merchant\sID\s # followed by "merchant ID "
(\d+) # and a number (put that in group 1)
\b(?![#]) # but only if that number isn't followed by #
) # End of lookahead
.* # Then match the actual string, any number of characters,
device\sID\s # followed by "device ID "
(\d+) # and a number (put that in group 2)
\b(?![#]) # but only if that number isn't followed by #
答案 2 :(得分:1)
您可以在此处使用re.match
来查找以特定模式开头的字符串:
>>> for s in strings:
... if re.match('[^s]+ ID \d+, [^s]+ ID \d+,', s):
... print(s)
...
merchant ID 1234, device ID 45678, serial# 123456789
merchant ID 1234, device ID 4567, serial# 123456789
device ID 4567, merchant ID 8765, user ID 531476, serial# 87654321
演示并解释模式:https://regex101.com/r/qA9pY7/1
我在此处添加了^
来模拟re.match
的行为。