Question

我正在扫描C文件的文本并搜索文件中的任何评论，评论在表单中。

/* this is a comment */

我找到评论的正则表达式是

comment = r'\/\*(?:[^*]|\*[^/])*\*\/'

然后我这样做来扫描文件并找到评论......

for line in pstream:
            findComment = re.search(comment, line)
            if findComment:
                Comment = findComment.group(0)
                if isinstance(Comment, str):
                    print(Comment)
                if isinstance(line, str):
                    print(line)
                line = re.sub(Comment, "", line)
                print(line)

我想找到评论并从文件的文本中删除它们。

但上面代码的输出是..

/* hello */
#include  /* hello */ "AnotherFile.h"
#include  /* hello */ "AnotherFile.h"

在line的第二次打印中，我希望/* hello */不在那里，我认为这意味着评论已从文件中删除..但我的re.sub没有＆＃39 ;似乎对它做了什么......

任何帮助？

编辑：我不确定为什么两张#include打印件的颜色较浅，但为了澄清，它们也会像/* hello */一样打印

我使用代码

在另一个文件中测试了我的re.sub

import re

line = '#include /* hello */ "file.h"'
Comment = '/* hello */'

line = re.sub(Comment, " ", line)

print(line)

它打印..

#include /* hello */ "file.h"

但我不希望/* hello */在那里:(

Answer 1

我发现您使用Comment作为正则表达式。由于它可能（并且在这种情况下）包含特殊的正则表达式元字符，因此您需要re.escape它们。

使用re.escape(Comment)：

line = re.sub(re.escape(Comment), "", line)

请参阅demo

第二个print的输出现在符合预期：

/* hello */
#include  /* hello */ "AnotherFile.h"
#include   "AnotherFile.h"

要确保删除初始空格，您可以在开头添加r"\s*"（see demo）：

line = re.sub(r"\s*" + re.escape(Comment), "", line)

从文件中删除字符串中的单词，Python正则表达式

1 个答案: