Question

我有一些带有/* */和//样式注释的c ++代码。我想有办法自动删除它们。显然，使用编辑器（例如ultraedit）和一些正则表达式搜索/*，*/和//应该可以胜任。但是，仔细看看，完整的解决方案并不那么简单，因为序列/ *或//如果它们位于另一个注释，字符串文字或字符文字中，则可能不代表注释。 e.g。

printf(" \" \" " "  /* this is not a comment and is surrounded by an unknown number of double-quotes */");

是双引号内的注释序列。并且，确定字符串是否在一对有效双引号内并不是一项简单的任务。虽然这个

// this is a single line comment /* <--- this does not start a comment block 
// this is a second comment line with an */ within

是其他评论中的评论序列。

是否有更全面的方法从C ++源中删除注释，并考虑字符串文字和注释？例如，我们可以指示预处理器删除注释，而不执行#include指令吗？

Answer 1

C预处理器可以删除评论。

编辑：

我已更新，以便我们可以使用MACROS扩展#if语句

> cat t.cpp
/*
 * Normal comment
 */
// this is a single line comment /* <--- this does not start a comment block 
// this is a second comment line with an */ within
#include <stdio.h>

#if __SIZEOF_LONG__ == 4
int bits = 32;
#else
int bits = 16;
#endif

int main()
{
    printf(" \" \" " " /* this is not a comment and is surrounded by an unknown number of double-quotes */");
    /*
     * comment with a single // line comment enbedded.
     */
    int x;
    // A single line comment /* Normal enbedded */ Comment
}

因为我们希望#if语句正确扩展，所以我们需要一个定义列表那是相对微不足道的。 cpp -E -dM。

然后我们通过预处理器管理#defines和原始文件，但是这次阻止扩展包。

> cpp -E -dM t.cpp > /tmp/def
> cat /tmp/def t.cpp | sed -e s/^#inc/-#inc/ | cpp - | sed s/^-#inc/#inc/
# 1 "t.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "t.cpp"






#include <stdio.h>


int bits = 32;




int main()
{
    printf(" \" \" " " /* this is not a comment and is surrounded by an unknown number of double-quotes */");    



    int x;

}

Answer 2

我们的SD C++ Formatter可以选择打印源文本并删除所有评论。它使用我们的完整C ++前端来解析文本，因此它不会被空白，换行符，字符串文字或预处理器问题混淆，也不会因格式化更改而破坏代码。

如果要删除注释，可能会尝试对源代码进行模糊处理。 Formatter也有一个混淆版本。

Answer 3

您可以使用基于规则的解析器（例如boost :: spirit）为注释编写语法规则。您需要根据编译器决定是否处理嵌套注释。删除注释的语义操作应该非常简单。

Answer 4

正则表达式并不是要解析语言，这是一种令人沮丧的尝试。

你实际上需要一个完整的解析器。您可能希望考虑Clang，重写是Clang库套件的明确目标，并且已经实现了可以从中获取灵感的重写器。

Answer 5

可能会有人对我自己的问题投票。

感谢 Martin York的想法，我发现在Visual Studio中，解决方案看起来非常简单（需要进一步测试）。只需将所有预处理程序指令重命名为其他东西，（无效的c ++语法就可以了）并使用cl.exe和/ P

cl target.cpp /P

并生成target.i。它包含来源减去评论。只需将以前的指令重命名，然后就可以了。可能您需要删除cl.exe生成的#line指令。

这是有效的，因为根据MSDN，翻译的阶段是：

角色映射 源文件中的字符映射到内部源表示。在此阶段，Trigraph序列将转换为单字符内部表示。

线条拼接 所有以反斜杠（）结尾并紧跟换行符的行都与源文件中的下一行连接，形成来自物理行的逻辑行。除非它是空的，否则源文件必须以不带反斜杠的换行符结尾。

<强>标记化源文件分为预处理标记和空白字符。源文件中的注释每个都替换为一个空格字符。保留换行符。

<强>预处理执行预处理指令并将宏扩展到源文件中。 #include语句从任何包含文本的前三个转换步骤开始调用转换。

字符集映射 所有源字符集成员和转义序列都将转换为执行字符集中的等效项。对于Microsoft C和C ++，源和执行字符集都是ASCII。

字符串连接 所有相邻的字符串和宽字符串文字都是连接在一起的。例如，“String”“concatenation”变为“String concatenation”。

<强>翻译所有标记都在语法和语义上进行分析;这些令牌将转换为目标代码。

<强>联动解析所有外部引用以创建可执行程序或动态链接库

在预处理阶段之前的标记化期间删除注释。因此，只需确保在预处理阶段，没有任何可用的东西用于处理（删除所有指令），其输出应该只是前三个阶段处理的那些。

对于用户定义的.h文件，请使用/ FI选项手动包含它们。生成的.i文件将是.cpp和.h的组合。没有评论。每个部分前面都有一个带有正确文件名的#line。因此很容易被编辑器拆分。如果我们不想手动拆分它们，我们可能需要使用某些编辑器的宏/脚本工具来自动完成它。

所以，现在，我们不必关心任何预处理器指令。更好的是处理行继续字符（反斜杠）。

e.g。

// vc8.cpp : Defines the entry point for the console application.
//

-#include "stdafx.h"
-#include <windows.h>
-#define NOERR
-#ifdef NOERR
  /* comment here */
 whatever error line is ok
-#else
  some error line if NOERR not defined
      // comment here
-#endif
void pr() ;
int _tmain(int argc, _TCHAR* argv[])
{
    pr();
    return 0;
}

/*comment*/

void pr() {
    printf(" /* "); /* comment inside string " */
    // comment terminated by \
    continue a comment line
    printf(" "); /** " " string inside comment */
    printf/* this is valid comment within line continuation */\
("some weird lines \
with line continuation");
}

在cl.exe vc8.cpp /P之后，它变成了这个，然后在恢复指令（并删除#line）后再次被送到cl.exe

#line 1 "vc8.cpp"



-#include "stdafx.h"
-#include <windows.h>
-#define NOERR
-#ifdef NOERR

 whatever error line is ok
-#else
  some error line if NOERR not defined

-#endif
void pr() ;
int _tmain(int argc, _TCHAR* argv[])
{
    pr();
    return 0;
}



void pr() {
    printf(" /* "); 


    printf(" "); 
    printf\
("some weird lines \
with line continuation");
}

Answer 6

#include <iostream>
#include<fstream>
using namespace std;

int main() {
    ifstream fin;
    ofstream fout;
    fin.open("input.txt");
    fout.open("output.txt");
    char ch;
    while(!fin.eof()){
        fin.get(ch);
        if(ch=='/'){
            fin.get(ch);
            if(ch=='/' )
            {   //cout<<"Detected\n";
                fin.get(ch);
                while(!(ch=='\n'||ch=='\0'))
                {
                //cout<<"while";
                fin.get(ch);
                }
            }
            if(ch=='*')
            {
                fin.get(ch);
                while(!(ch=='*')){
                    fin.get(ch);
                }
                fin.get(ch);
                if(ch=='/'){
                //  cout<<"Detected Multi-Line\n";
                    fin.get(ch);
                }

            }
        }
        fout<<ch;
    }
    return 0;
}

从源代码中删除C ++注释

6 个答案:

编辑：