sed(awk?)删除几乎重复的行

时间:2015-04-29 17:40:51

标签: bash awk sed

我有一个文件可以将HTML样式的评论与其真实文本交替出现:

<!-- Here's a first line -->
Here's a first line
<!-- Here's a second line -->
Here's a third line

如果评论与标签本身除以下一行相同,我想删除它,否则请将其删除:

Here's a first line
<!-- Here's a second line -->
Here's a third line

我在这里读过类似的问题,但无法根据我的情况推断出解决方案。

1 个答案:

答案 0 :(得分:1)

sed '/^<!-- \(.*\) -->$/N;s/^<!-- \(.*\) -->\n\1$/\1/'
#
#    /^<!-- \(.*\) -->$/   match an HTML comment as its own line, in which case
#                       N; add the next line to the pattern space and keep going
# 
#                         s/^<!-- \(.*\) -->\n\1$/     detect a comment as you
#                                                 \1/  described and replace it
#                                                      appropriately

如图所示:

$ sed '/^<!-- \(.*\) -->$/N;s/^<!-- \(.*\) -->\n\1$/\1/' <<EOF
> <!-- Foo -->
> Foo
> <!-- Bar -->
> Baz
> <!-- Quux -->
> Quux
> 
> Something
> Something
> Another something
> EOF

给出:

Foo
<!-- Bar -->
Baz
Quux

Something
Something
Another something

您可能需要对此进行调整以处理缩进,但这不应该太令人惊讶。您可能还想切换到sed -r,这将要求不转义括号。