使用sed删除内联注释

时间:2013-03-17 14:55:16

标签: sed comments

我想使用sed删除文本文件中的所有注释。假设评论从“A”字符开始,以新行字符结束。我想删除从“A”到行尾的所有内容,包括换行符。但是,我不想删除从“AA”开始的评论。

示例输入:

%% comment to do not delete
% comment to delete
% another comment to delte
%% comment to do not delete
Some text % comment to delete
and some more text %% comment to do not delete

期望的输出:

%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete

5 个答案:

答案 0 :(得分:2)

尝试这样做:

$ perl -pe '/^[^%]*%%/ && next; s/%.*\n//g' file.txt

输出

%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete

注意

如果您需要就地更改文件,请添加-i开关(在测试后),以便:

$ perl -i -pe '/^[^%]*%%/ && next; s/%.*\n//g' file.txt

感谢scrutinizer的贡献。

答案 1 :(得分:2)

完美应用perl的负面后视断言:

perl -pe 's/(?<!%)%(?!%).*$//s' << END
%% comment to do not delete
% comment to delete
% another comment to delte
%% comment to do not delete
Some text % comment to delete
and some more text %% comment to do not delete
END

输出

%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete

s标志确保点将与换行符匹配,以按要求实现“换行”。

这种正则表达式匹配可能会导致您遇到问题,例如,如果您有像

这样的行
The date is `date +%Y%m%d` % this is a comment

你最终会得到

The date is `date +

如果您的实际评论需要周围的空白,您可以使用此正则表达式:

(^| )%( .*|)$

表示

  • 行的开头或空格
  • 后跟注释char
  • 后跟(一个空格和零个或多个字符)或没有
  • 后面是行尾

答案 2 :(得分:1)

也许这就是:

第二次更新

$ sed -e '/^%[^%]/d' -e 's/ %[^%]*$/@/' -e :a -e '/@/N; s/\n//; ta' input | sed 's/@/ /g'
%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete

答案 3 :(得分:0)

编辑添加了更改,以便在文件的最后一行正常运行... 尝试:

sed -e :a -e '/^[^%]*%%/n; /%/{s/%.*//; N; s/\n//;};ta' file

使用输入测试:

%% comment to do not delete
% comment to delete
% another comment to delte
%
%% comment to do not delete
Some text % comment to delete
Some more text % more comment to delete
and some more text %% comment to do not delete
fdgdfgdgdgd %
gfdgd
some text followed by %% comment to not delete that contains a % somewhere
some text followed by % comment to delete that contains %% somewhere
hello there

输出:

%% comment to do not delete
%% comment to do not delete
Some text Some more text and some more text %% comment to do not delete
fdgdfgdgdgd gfdgd
some text followed by %% comment to not delete that contains a % somewhere
some text followed by hello there

答案 4 :(得分:0)

使用带有Sed的表达式顺序

使用sed,指令的顺序可能很重要。例如:

$ sed -ne '/^% /d; /[^%]%.*/ {s/%.*//; n}; p' /tmp/corpus 
%% comment to do not delete
%% comment to do not delete
and some more text %% comment to do not delete

在此示例中,sed脚本按此顺序执行其任务:

  1. 抑制输出。
  2. 删除以百分号开头的行。
  3. 使用替换从一个百分比中删除所有字符到行尾,然后将下一行附加到模式空间而不换行。
  4. 打印图案空间。
  5. 此脚本适用于您在问题中提供的语料库。不保证在没有修改的情况下与任何其他语料库一起使用,如果您附加到模式空间的行包含注释字符,则显然不起作用。