Question

我需要在一堆文件名中找到3或4位数字后的空格，并用下划线替换空格。但是我似乎甚至找不到4位数字。

s = "the blue dog and blue cat wore blue hats"
p = re.compile(r'blue (?P<animal>dog|cat)')
print(p.sub(r'gray \g<animal>',s))

#Gives basically what I want.
the gray dog and gray cat wore blue hats


s = "7053 MyFile.pptx"
p = re.compile('[0-9][0-9][0-9][0-9](?P<dig> )')
print(p.sub('_\g<dig>', s))

#Takes out the numbers, which I need to keep
_ MyFile.pptx

我似乎所做的所有事情都带有将数字删除的表达式，我需要保留该数字。

最后，我要

7035 MyFile.pptx

成为

7035_MyFile.pptx

Answer 1

我想将3或4位数字后跟空格替换为相同的数字，再加上下划线，正确的正则表达式语法/替换为：

re.sub(r"([0-9]{3,4})\s", r"\1_", s)

您可能误读了组/反向引用的工作方式。应该在组中的内容，必须在括号内。如果要使用命名组（这是不必要的）：

re.sub(r"(?P<dig>[0-9]{3,4})\s", r"\g<dig>_", s)

或者使用类似于您的示例的预编译正则表达式：

s = "7053 MyFile.pptx"
p = re.compile(r"(?P<dig>[0-9]{3,4})\s")
print(p.sub('\g<dig>_', s))

{3,4}之后的

[0-9]表示三或四个匹配项。 \s代表空白（不只是空格）。

实际上，只查找所写的3位数字也将匹配4位数字，因为我们不限制在匹配模式之前发生的事情。根据您要查找的内容，您可能希望通过在单词边缘的^（行首）或\b空字符之前加模式来限制匹配项。

带数字的Python正则表达式问题

1 个答案: