Question

是否有一种简单的方法可以在不进行任何预处理的情况下从C / C ++源文件中删除注释。（也就是说，我认为你可以使用gcc -E，但这会扩展宏。）我只想要删除注释的源代码，不应该改变任何其他内容。

编辑：

偏好现有工具。我不想用正则表达式自己写这个，我预见代码中会有太多惊喜。

Answer 1

在源文件上运行以下命令：

gcc -fpreprocessed -dD -E test.c

感谢KennyTM寻找合适的旗帜。这是完整性的结果：

test.c的：

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo
/* comments? comments. */
// c++ style comments

gcc -fpreprocessed -dD -E test.c：

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo

Answer 2

这取决于您的评论有多悖逆。我有一个程序scc来删除C和C ++注释。我也有一个测试文件，我尝试了GCC（MacOS X上的4.2.1）以及当前选择的答案中的选项 - 而且GCC似乎并没有在一些可怕的屠杀评论中做得很好。测试用例。

注意：这不是现实生活中的问题 - 人们不会写出如此可怕的代码。

考虑测试用例的（子集 - 总共135行中的36个）：

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.

This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

在我的Mac上，GCC（gcc -fpreprocessed -dD -E subset.c）的输出是：

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.

This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

'scc'的输出是：

The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.

The regular C comment number 2 has finished.

This is followed by regular C comment number 3.

'scc -C'（识别双斜杠注释）的输出是：

The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.

The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.

The regular C comment number 2 has finished.

This is followed by regular C comment number 3.

SCC的来源现已在GitHub上提供

当前版本的SCC是6.60（日期为2016-06-12），尽管Git版本是在2017-01-18（美国/太平洋时区）创建的。该代码可从https://github.com/jleffler/scc-snapshots的GitHub获得。您还可以找到以前版本（4.03,4.04,5.05）和两个预发行版（6.16,6.50）的快照 - 这些都标记为release/x.yz。

代码仍然主要是在RCS下开发的。我还在研究如何使用子模块或类似的机制来处理常见的库文件，例如stderr.c和stderr.h（也可以在https://github.com/jleffler/soq中找到）。< / p>

SCC版本6.60试图理解C ++ 11，C ++ 14和C ++ 17构造，例如二进制常量，数字标点符号，原始字符串和十六进制浮点数。它默认为C11模式操作。（请注意，上面提到的-C标志的含义 - 在答案正文中描述的版本4.0x和当前最新版本的6.60版本之间翻转。）

Answer 3

gcc -fpreprocessed -dD -E对我不起作用，但是这个程序可以做到：

#include <stdio.h>

static void process(FILE *f)
{
 int c;
 while ( (c=getc(f)) != EOF )
 {
  if (c=='\'' || c=='"')            /* literal */
  {
   int q=c;
   do
   {
    putchar(c);
    if (c=='\\') putchar(getc(f));
    c=getc(f);
   } while (c!=q);
   putchar(c);
  }
  else if (c=='/')              /* opening comment ? */
  {
   c=getc(f);
   if (c!='*')                  /* no, recover */
   {
    putchar('/');
    ungetc(c,f);
   }
   else
   {
    int p;
    putchar(' ');               /* replace comment with space */
    do
    {
     p=c;
     c=getc(f);
    } while (c!='/' || p!='*');
   }
  }
  else
  {
   putchar(c);
  }
 }
}

int main(int argc, char *argv[])
{
 process(stdin);
 return 0;
}

Answer 4

有一个stripcmt程序可以做到这一点：

StripCmt是一个用C编写的简单实用程序，用于从C，C ++和Java源文件中删除注释。在Unix文本处理程序的传统中，它可以作为FIFO（先进先出）过滤器或在命令行上接受参数。

（根据hlovdal的答案：question about Python code for this）

Answer 5

这是一个删除//一行和/ *多行* /注释

的perl脚本

  #!/usr/bin/perl

  undef $/;
  $text = <>;

  $text =~ s/\/\/[^\n\r]*(\n\r)?//g;
  $text =~ s/\/\*+([^*]|\*(?!\/))*\*+\///g;

  print $text;

它需要您的源文件作为命令行参数。将脚本保存到文件，比如remove_comments.pl 并使用以下命令调用它：perl -w remove_comments.pl [您的源文件]

希望它会有所帮助

Answer 6

我也有这个问题。我找到了这个工具（Cpp-Decomment），它对我有用。但是，如果注释行延伸到下一行，它将忽略它。例如：

// this is my comment \
comment continues ...

在这种情况下，我在程序中找不到方法，所以只搜索被忽略的行并手动修复。我相信会有一个选项，或者你可以改变程序的源文件来做到这一点。

Answer 7

我使用标准的C库编写了一个C程序，大约200行，它删除了C源代码文件的注释。 qeatzy/removeccomments

行为

跨越多行或占据整行的C样式注释被清零。
C样式注释保持不变。例如void init(/* do initialization */) {...}
占据整个行的C ++样式注释被清零。
通过检查"和\"来尊重C字符串文字。
处理行继续。如果上一行以\结尾，则当前行是上一行的一部分。
行号保持不变。归零的行或部分行变为空。

测试和分析

我使用包含大量注释的最大的cpython源代码进行了测试。在这种情况下，它会比gcc 正确且快快2-5

time gcc -fpreprocessed -dD -E Modules/unicodeobject.c > res.c 2>/dev/null
time ./removeccomments < Modules/unicodeobject.c > result.c

用法

/path/to/removeccomments < input_file > output_file

Answer 8

我相信如果你使用一个陈述，你可以轻松地从C

中删除评论

perl -i -pe ‘s/\\\*(.*)/g’ file.c This command Use for removing * C style comments 
perl -i -pe 's/\\\\(.*)/g' file.cpp This command Use for removing \ C++ Style Comments

只有使用此命令时，它无法删除包含多行的注释。但是通过使用此注册表，您可以轻松实现多行删除注释的逻辑

Answer 9

最近我写了一些Ruby代码来解决这个问题。我考虑过以下例外情况：

在字符串中发表评论
在一行上多行注释，修复贪婪的匹配。
多行多行

以下是code：

它使用以下代码预处理每一行，以防这些注释出现在字符串中。如果它出现在你的代码中，呃，运气不好。您可以用更复杂的字符串替换它。

MUL_REPLACE_LEFT =“ MUL_REPLACE_LEFT ”
MUL_REPLACE_RIGHT =“ MUL_REPLACE_RIGHT ”
SIG_REPLACE =“ SIG_REPLACE ”

用法：ruby -w inputfile outputfile

Answer 10

我知道它已经很晚了，但我想我会分享我的代码和我编写编译器的第一次尝试。

注意：这不会考虑多行评论中的"\*/"，例如/\*...."*/"...\*。然后，gcc 4.8.1也没有。

void function_removeComments(char *pchar_sourceFile, long long_sourceFileSize)
{
    long long_sourceFileIndex = 0;
    long long_logIndex = 0;

    int int_EOF = 0;

    for (long_sourceFileIndex=0; long_sourceFileIndex < long_sourceFileSize;long_sourceFileIndex++)
    {
        if (pchar_sourceFile[long_sourceFileIndex] == '/' && int_EOF == 0)
        {
            long_logIndex = long_sourceFileIndex;  // log "possible" start of comment

            if (long_sourceFileIndex+1 < long_sourceFileSize)  // array bounds check given we want to peek at the next character
            {
                if (pchar_sourceFile[long_sourceFileIndex+1] == '*') // multiline comment
                {
                    for (long_sourceFileIndex+=2;long_sourceFileIndex < long_sourceFileSize; long_sourceFileIndex++)
                    {
                        if (pchar_sourceFile[long_sourceFileIndex] == '*' && pchar_sourceFile[long_sourceFileIndex+1] == '/')
                        {
                            // since we've found the end of multiline comment
                            // we want to increment the pointer position two characters
                            // accounting for "*" and "/"
                            long_sourceFileIndex+=2;  

                            break;  // terminating sequence found
                        }
                    }

                    // didn't find terminating sequence so it must be eof.
                    // set file pointer position to initial comment start position
                    // so we can display file contents.
                    if (long_sourceFileIndex >= long_sourceFileSize)
                    {
                        long_sourceFileIndex = long_logIndex;

                        int_EOF = 1;
                    }
                }
                else if (pchar_sourceFile[long_sourceFileIndex+1] == '/')  // single line comment
                {
                    // since we know its a single line comment, increment file pointer
                    // until we encounter a new line or its the eof 
                    for (long_sourceFileIndex++; pchar_sourceFile[long_sourceFileIndex] != '\n' && pchar_sourceFile[long_sourceFileIndex] != '\0'; long_sourceFileIndex++);
                }
            }
        }

        printf("%c",pchar_sourceFile[long_sourceFileIndex]);
     }
 }

Answer 11

#include<stdio.h>
{        
        char c;
        char tmp = '\0';
        int inside_comment = 0;  // A flag to check whether we are inside comment
        while((c = getchar()) != EOF) {
                if(tmp) {
                        if(c == '/') {
                                while((c = getchar()) !='\n');
                                tmp = '\0';
                                putchar('\n');
                                continue;
                        }else if(c == '*') {
                                inside_comment = 1;
                                while(inside_comment) {
                                        while((c = getchar()) != '*');
                                        c = getchar();
                                        if(c == '/'){
                                                tmp = '\0';
                                                inside_comment = 0;
                                        }
                                }
                                continue;
                        }else {
                                putchar(c);
                                tmp = '\0';
                                continue;
                        }
                }
                if(c == '/') {
                        tmp = c;
                } else {
                        putchar(c);
                }
        }
        return 0;
}

此程序适用于条件，即//和/ ..... /

从C / C ++代码中删除注释

11 个答案:

注意：这不是现实生活中的问题 - 人们不会写出如此可怕的代码。

SCC的来源现已在GitHub上提供

行为

测试和分析

用法