用于从C代码中删除单行和多行注释的正则表达式代码

时间:2014-08-20 21:06:31

标签: python regex

我有以下正则表达式删除多行评论,但我很难弄清楚如何删除以//开头的评论。

当我添加(//.*)作为正则表达式时,它似乎永远不会起作用。

 pattern = r"""
                        ##  --------- COMMENT ---------
       /\*              ##  Start of /* ... */ comment
       [^*]*\*+         ##  Non-* followed by 1-or-more *'s
       (                ##
         [^/*][^*]*\*+  ##
       )*               ##  0-or-more things which don't start with /
                        ##    but do end with '*'
       /                ##  End of /* ... */ comment
                        ##
        |               ## --------- COMMENT ---------
         (//.*)         ## Start of // comment
                        ##
     |                  ##  -OR-  various things which aren't comments:
       (                ##
                        ##  ------ " ... " STRING ------
         "              ##  Start of " ... " string
         (              ##
           \\.          ##  Escaped char
         |              ##  -OR-
           [^"\\]       ##  Non "\ characters
         )*             ##
         "              ##  End of " ... " string
       |                ##  -OR-
                        ##
                        ##  ------ ' ... ' STRING ------
         '              ##  Start of ' ... ' string
         (              ##
           \\.          ##  Escaped char
         |              ##  -OR-
           [^'\\]       ##  Non '\ characters
         )*             ##
         '              ##  End of ' ... ' string
       |                ##  -OR-
                        ##
                        ##  ------ ANYTHING ELSE -------
         .              ##  Anything other char
         [^/"'\\]*      ##  Chars which doesn't start a comment, string
       )                ##    or escape

"""

有人可以告诉我哪里出错了吗? 我甚至尝试了以下正则表达式:

//[^\r\n]*$

但这也不起作用。

1 个答案:

答案 0 :(得分:1)

尝试其中一个......

他们都捕获评论和非评论。


这个保留格式并使用无修饰符
从find while循环中,将Group 1(注释)存储在新文件中,
替换原始文件中的第2组(非注释) 根据需要调整正则表达式换行符。 IE浏览器。将\n更改为\r\n等...

   # (/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\]*)


   (                                # (1 start), Comments 
        /\*                              # Start /* .. */ comment
        [^*]* \*+
        (?: [^/*] [^*]* \*+ )*
        /                                # End /* .. */ comment
     |  
        //                               # Start // comment
        (?: [^\\] | \\ \n? )*?           # Possible line-continuation
        \n                               # End // comment
   )                                # (1 end)
|  
   (                                # (2 start), Non - comments 
        "
        (?: \\ [\S\s] | [^"\\] )*        # Double quoted text
        "
     |  '
        (?: \\ [\S\s] | [^'\\] )*        # Single quoted text
        ' 
     |  [\S\s]                           # Any other char
        [^/"'\\]*                        # Chars which doesn't start a comment, string, escape,
                                         # or line continuation (escape + newline)
   )                                # (2 end)

上次返工 -
保存格式是否更好。
有关换行符的格式问题从注释尾部开始解决 虽然这解决了字符串连接的问题,但它确实留下了偶尔的空白 评论所在的行。对于%98的评论,这不会是一个问题 但是,是时候把这只死狗独自留下了。

这个保留格式。它使用正则表达式修饰符多行(请务必设置) 与上述相同。
这假设您的引擎支持\h水平制表符。如果不让我知道。
根据需要调整正则表达式换行符。 IE浏览器。将\n更改为\r\n等...

   #  ((?:(?:^\h*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:\h*\n(?=\h*(?:\n|/\*|//)))?|//(?:[^\\]|\\\n?)*?(?:\n(?=\h*(?:\n|/\*|//))|(?=\n))))+)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\\s]*)

   (                                # (1 start), Comments 
        (?:
             (?: ^ \h* )?                     # <- To preserve formatting
             (?:
                  /\*                              # Start /* .. */ comment
                  [^*]* \*+
                  (?: [^/*] [^*]* \*+ )*
                  /                                # End /* .. */ comment
                  (?:
                       \h* \n                                      
                       (?=                              # <- To preserve formatting 
                            \h*                              # <- To preserve formatting
                            (?: \n | /\* | // )              # <- To preserve formatting
                       )
                  )?                               # <- To preserve formatting
               |  
                  //                               # Start // comment
                  (?: [^\\] | \\ \n? )*?           # Possible line-continuation
                  (?:                              # End // comment
                       \n                               
                       (?=                              # <- To preserve formatting
                            \h*                              # <- To preserve formatting
                            (?: \n | /\* | // )              # <- To preserve formatting
                       )
                    |  (?= \n )
                  )
             )
        )+                               # Grab multiple comment blocks if need be
   )                                # (1 end)

|                                 ## OR

   (                                # (2 start), Non - comments 
        "
        (?: \\ [\S\s] | [^"\\] )*        # Double quoted text
        "
     |  '
        (?: \\ [\S\s] | [^'\\] )*        # Single quoted text
        ' 
     |  [\S\s]                           # Any other char
        [^/"'\\\s]*                      # Chars which doesn't start a comment, string, escape,
                                         # or line continuation (escape + newline)
   )                                # (2 end)