我有一个像这样的头文件:
/*
* APP 180-2 ALG-254/258/772 implementation
* Last update: 03/01/2006
* Issue date: 08/22/2004
*
* Copyright (C) 2006 Somebody's Name here
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. Neither the name of the project nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef HEADER_H
#define HEADER_H
/* More comments and C++ code here. */
#endif /* End of file. */
我希望仅提取第一个 C样式注释的内容,并在每行的开头删除“*”以获取包含以下内容的文件:
APP 180-2 ALG-254/258/772 implementation
Last update: 03/01/2006
Issue date: 08/22/2004
Copyright (C) 2006 Somebody's Name here
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the project nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
请使用Python,Perl,sed或其他方式在Unix上建议一种简单的方法。优选作为单线。
答案 0 :(得分:5)
这应该适合你:
sed -n '/\*\//q; /^\/\*/d; s/^ \* \?//p' <file.h >comment.txt
以下是一个解释:sed(您可能知道)是一个命令,它通过一个文件将规则列表应用到每一行。每个规则都包含一个“选择器”和仅当选择器匹配时才应用于该行的命令。
第一条规则有选择器/\*\//
。这是一个正则表达式选择器;它匹配包含字符*/
的任何行。这两个都需要反斜杠转义,因为它们在正则表达式中具有特殊含义。 (我假设这只会匹配你的情况下评论的结束行,并且应该删除整行。)命令是q
,这意味着“退出”。 sed刚刚停止。通常会打印出该行,但我提供了-n
选项,这意味着“除非明确指示,否则不要打印。”
第二个规则有选择器/^\/\*/
,它又是一个正则表达式选择器,匹配行开头的字符/*
。同样,我假设这一行不包含评论的一部分。 d
命令告诉sed删除此行并继续。
最终规则没有选择器,因此它适用于所有行(除非先前的命令阻止处理到达最终规则)。最后一条规则中的命令是替换命令s/PATTERN/REPLACEMENT/
,它在行中找到与某个模式匹配的文本,并将其替换为替换文本。这里的模式是^ \* \?
,它匹配一个空格,一个星号,以及0或1个空格,但只在行的开头。而且替代品一无所获。那么sed只是删除了前导空格 - 星号 - (空格)?序列。 p
实际上是替换命令的标志,告诉sed打印出替换的结果。因为-n
选项而需要它。
答案 1 :(得分:4)
Pyparsing包含一个内置模式,用于匹配各种语言的注释格式。使用cStyleComment
和scanString
查找源文件中的第一条注释,其余只是字符串函数:
c_src = open(c_source_file).read()
from pyparsing import cStyleComment
cmt = cStyleComment.scanString(c_src).next()[0][0]
lines = [l[3:] for l in cmt.splitlines()]
print '\n'.join(lines)
scanString
是一个生成器,它在转到下一个实例之前返回每个匹配项,因此只处理第一个注释。使用示例代码,返回:
APP 180-2 ALG-254/258/772 implementation
Last update: 03/01/2006
Issue date: 08/22/2004
Copyright (C) 2006 Somebody's Name here
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the project nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
答案 2 :(得分:-1)
sed -i -r "s/[\/\ ]{1}\*[\/\ ]?//g" YOURFILENAME
这将替换文件中的修剪注释,并保留内容。这将修改YOURFILENAME文件。如果你不想从行中删除-i