Question

我正在尝试解决正则表达式拼图，我......感到困惑。我期待以下内容：

import re
import fileinput

TEST_DATA = [
    "6",
    "2 ",
    "1 877 2638277 ",
    "91-011-23413627"
]

for line in TEST_DATA:
    print(
        re.sub(
            r'(\d{1,3})[- ](\d{2,3})[- ]+(\d{5,10})',
            r'CountryCode=\1,LocalAreaCode=\2,Number=\3',
            line))

给我这个：

CountryCode=1,LocalAreaCode=877,Number=2638277 
CountryCode=91,LocalAreaCode=011,Number=23413627

相反，我得到了这个：

6
2 
CountryCode=1,LocalAreaCode=877,Number=2638277 
CountryCode=91,LocalAreaCode=011,Number=23413627

我不明白为什么要打印不匹配的行。

Answer 1

无论是否发生替换，

re.sub都会返回字符串。来自the documentation：

返回通过替换repl替换字符串中最左边非重叠模式而获得的字符串。如果找不到模式，则返回字符串不变。

也许您可以首先检查是否发生了match，然后执行替换。

for line in TEST_DATA:
    if re.match(my_pattern, line):
        print(
            re.sub(
                r'(\d{1,3})[- ](\d{2,3})[- ]+(\d{5,10})',
                r'CountryCode=\1,LocalAreaCode=\2,Number=\3',
                line))

Answer 2

我得告诉你，我真的很讨厌re.sub。我不知道为什么，我没有一个很好的解释，但我像瘟疫一样避免它。我甚至不记得用它效果不佳，我只是不喜欢它......

它没有产生预期输出的原因是re.sub将返回字符串，无论它是否与正则表达式匹配。它有点像"Hello there".replace("foo","bar") - 只是因为它找不到任何要替换的东西并不意味着它会抛弃你的字符串。我会做的是：

pattern = r'(?P<country>\d{1,3})[- ](?P<area>\d{2,3})[- ]+(?P<number>\d{5,10})'
text = r"CountryCode={country},LocalAreaCode={area},number={number}"

for line in TEST_DATA:
    match = re.match(pattern,line)
    if not match: continue
    print(text.format(**match.groupdict()))

Answer 3

尝试：

import re    

TEST_DATA = [
    "6",
    "2 ",
    "1 877 2638277 ",
    "91-011-23413627"
]

pattern = r'(\d{1,3})[- ](\d{2,3})[- ]+(\d{5,10})'
rep = r'CountryCode=\1,LocalAreaCode=\2,Number=\3'

for line in TEST_DATA:
    if re.match(pattern, line):
        print re.sub(pattern,rep,line)

Python重新返回不匹配的行

3 个答案: