Lex程序从输入文件中删除注释

时间:2018-09-12 22:17:57

标签: c regex comments lex

当前,我正在尝试从输入文件中删除所有形式的注释。但是,我无法弄清楚如何删除特定的表单,特别是此表单“ {comment}”。我知道此站点上有很多正则表达式示例可以删除多行/单行注释,但我无法弄清楚。

输入:

       int j=100;
       /* comment needs to be removed*/
       int c = 200;


      /*
       *comment needs to be removed 
       */

      count = count + 1;

     {comment needs to be removed}

      i++;

输出:

int j=100;
int c =200;
count = count +1;
i++;

我已经可以删除前2条评论,但不能删除最后一条。我尝试使用"{}".*的正则表达式,但是对我的上一条评论{comment}无效。是否存在可用于解决此问题的正则表达式?还是我最好在C中创建一个函数并以这种方式处理情况?

2 个答案:

答案 0 :(得分:0)

我不知道{}中包含什么样的评论,但是您应该小心。

尝试此正则表达式。

\/\*[\s\S]*?\*\/|{[^{}]*?}

Try it online

答案 1 :(得分:0)

==请注意,对于以下所有正则表达式,必须将匹配项替换为$2(捕获组2),后者回写非注释。这将有效删除所有==

的评论

这是标准的C ++注释解析器。
这是保留格式的扩展版本。

原始:

(?m)((?:(?:^[ \t]*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|/\*|//)))?|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|/\*|//))|(?=\r?\n))))+)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|(?:\r?\n|[\S\s])[^/"'\\\s]*)

定界/ regex /

/(?m)((?:(?:^[ \t]*)?(?:\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/)))?|\/\/(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/))|(?=\r?\n))))+)|((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:\r?\n(?:(?=(?:^[ \t]*)?(?:\/\*|\/\/))|[^\/"'\\\r\n]*))+|[^\/"'\\\r\n]+)+|[\S\s][^\/"'\\\r\n]*)/

演示PCRE:https://regex101.com/r/UldYK5/1
演示Python:https://regex101.com/r/avfSfB/1

-------------------------------------------- --------------

这是上述内容的修改版本,添加了您的{ .. }注释。
不建议使用此方法,因为{}在C中定义了语法

原始:

(?m)((?:(?:^[ \t]*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|/\*|//|\{)))?|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|/\*|//|\{))|(?=\r?\n))|\{[\S\s]*?\}(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|/\*|//|\{)))?))+)|((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:\r?\n(?:(?=(?:^[ \t]*)?(?:/\*|//|\{))|[^/"'\\\r\n{]*))+|[^/"'\\\r\n{]+)+|[\S\s][^/"'\\\r\n{]*)

定界/ regex /

/(?m)((?:(?:^[ \t]*)?(?:\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/|\{)))?|\/\/(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/|\{))|(?=\r?\n))|\{[\S\s]*?\}(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/|\{)))?))+)|((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:\r?\n(?:(?=(?:^[ \t]*)?(?:\/\*|\/\/|\{))|[^\/"'\\\r\n{]*))+|[^\/"'\\\r\n{]+)+|[\S\s][^\/"'\\\r\n{]*)/

演示PCRE(使用示例文本):https://regex101.com/r/xHTua7/1

带有注释的可读版本

    (?m)                             # Multi-line modifier
    (                                # (1 start), Comments 
         (?:
              (?: ^ [ \t]* )?                  # <- To preserve formatting
              (?:
                   /\*                              # Start /* .. */ comment
                   [^*]* \*+
                   (?: [^/*] [^*]* \*+ )*
                   /                                # End /* .. */ comment
                   (?:                              # <- To preserve formatting 
                        [ \t]* \r? \n                                      
                        (?=
                             [ \t]*                  
                             (?:
                                  \r? \n 
                               |  /\*
                               |  // 
                               |  \{                               # Added:  for {} comments
                             )
                        )
                   )?
                |                                 # or,
                   //                               # Start // comment
                   (?:                              # Possible line-continuation
                        [^\\] 
                     |  \\ 
                        (?: \r? \n )?
                   )*?
                   (?:                              # End // comment
                        \r? \n                               
                        (?=                              # <- To preserve formatting
                             [ \t]*                          
                             (?:
                                  \r? \n 
                               |  /\*
                               |  // 
                               |  \{                               # Added:  for {}  comments
                             )
                        )
                     |  (?= \r? \n )
                   )
                |                                 # or,
                   \{                               # Added:  Start { .. } comment
                   [\S\s]*? 
                   \}                               # Added:  End { .. } comment
                   (?:                              # <- To preserve formatting 
                        [ \t]* \r? \n                                      
                        (?=
                             [ \t]*                  
                             (?:
                                  \r? \n 
                               |  /\*
                               |  // 
                               |  \{                               # Added:  for {} comments
                             )
                        )
                   )?
              )
         )+                               # Grab multiple comment blocks if need be
    )                                # (1 end)

 |                                 ## OR

    (                                # (2 start), Non - comments 
         # Quotes
         # ======================
         (?:                              # Quote and Non-Comment blocks
              "
              [^"\\]*                          # Double quoted text
              (?: \\ [\S\s] [^"\\]* )*
              "
           |                                 # --------------
              '
              [^'\\]*                          # Single quoted text
              (?: \\ [\S\s] [^'\\]* )*
              ' 
           |                                 # --------------

              (?:                              # Qualified Linebreak's
                   \r? \n                           
                   (?:
                        (?=                              # If comment ahead just stop
                             (?: ^ [ \t]* )?
                             (?:
                                  /\*
                               |  // 
                               |  \{                               # Added:  for {} comments
                             )
                        )
                     |                                 # or,
                                                         # Added:  [^{] for {} comments
                        [^/"'\\\r\n{]*                   # Chars which doesn't start a comment, string, escape,
                                                         # or line continuation (escape + newline)
                   )
              )+
           |                                 # --------------
                                               # Added:  [^{] for {} comments
              [^/"'\\\r\n{]+                   # Chars which doesn't start a comment, string, escape,
                                               # or line continuation (escape + newline)

         )+                               # Grab multiple instances

      |                                 # or,
         # ======================
         # Pass through

         [\S\s]                           # Any other char
                                          # Added:  [^{] for {} comments
         [^/"'\\\r\n{]*                   # Chars which doesn't start a comment, string, escape,
                                          # or line continuation (escape + newline)

    )                                # (2 end), Non - comments