Question

我试图获取一个已经写入的cpp文件，并使用python脚本将头文件添加到包含列表中。当前，我创建了一个包含所有要添加的包含的字符串，然后使用将字符串替换为包含的re模块。所有包含的名称中都带有“ \ t”，这会引起问题。而不是按预期方式打印行（#include "abc\type\GenericTypeMT.h），我得到#include "abc ype\GenericTypeMT.h。当我将字符串打印到控制台时，它具有预期的格式，这使我相信这是一个re.sub问题，而不是写入文件的问题。下面是代码。

import re
import string

INCLUDE = "#include \"abc\\type\\"

with open("file.h", "r+") as f:
     a = ""
     b = ""
     for line in file:
         a = a + line
     f.seek(0,0)
     types = open("types.txt", "r+")
     for t in types:
         head = INCLUDE + t.strip() + "MT.h"
         b = b + head + "\n"
     a = re.sub(r'#include "abc\\type\\GenericTypeMT\.h"', b, a)
     types.close()
     print b
     print a
     f.write(a)

b的输出是：

#include "abc\type\GenericTypeMT.h"
#include "abc\type\ServiceTypeMT.h"
#include "abc\type\AnotherTypeMT.h"

a的（截断）输出为：

/* INCLUDES *********************************/
#include "abc   ype\GenericTypeMT.h"
#include "abc   ype\ServiceTypeMT.h"
#include "abc   ype\AnotherTypeMT.h"

#include <map>
...

与我的问题最接近的问题是How to write \t to file using Python，但这与我的问题不同，因为我的问题似乎源于正则表达式所做的替换，如写前打印内容所示

Answer 1

re.sub()函数还在替换字符串中扩展了元字符（转义序列）。您的替换字符串中的\t字符序列（由两个字符\和t组成，由re模块解释为制表符的转义序列：

>>> import re
>>> re.sub(r'^.', '\\t', 'foo')
'\too'
>>> print(re.sub(r'^.', '\\t', 'foo'))
    oo

但是如果您使用 function 作为替换值，则不会发生这种扩展。请注意，这不包括处理占位符，您必须使用传递到函数中的match对象来创建自己的占位符插入逻辑。

您的代码中没有任何占位符，因此创建功能的lambda就足够了：

a = re.sub(r'#include "abc\\type\\GenericTypeMT\.h"', lambda m: b, a)

在相同的人为foo示例字符串上进行演示：

>>> re.sub(r'^.', lambda m: '\\t', 'foo')
'\\too'
>>> print(re.sub(r'^.', lambda m: '\\t', 'foo'))
\too

不幸的是，re.escape() function对于将\反斜杠添加到更多字符而不是替换元字符过于贪婪；您最终会得到比起初更多的反斜杠。

请注意，由于实际上并没有进行任何模式匹配，因此您最好只使用str.replace()来完成工作：

a = a.replace(r'#include "abc\type\GenericTypeMT.h"', b)

\和.字符不再是正则表达式中的元字符，因此也不需要转义。

Python re.sub（）函数将文件路径中的“ \ t”转换为制表符

1 个答案: