如何在bash中删除包含任何匹配文本的行

时间:2015-03-27 07:34:51

标签: linux bash awk sed grep

我有一个文本文件。它看起来像这样:

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 2500: $103.17 :
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 5000: $170.00 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 250: $42.25 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 500: $44.00 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 1000: $54.08 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 2500: $79.33 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 5000: $144.33 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 500: $159.53 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 1000: $176.17 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 2500: $297.58 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 5000: $522.72 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 250: $138.70 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 500: $164.50 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 1000: $181.13 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 2500: $302.53 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 5000: $515.63 :

所以我有Business Cards,我有Door Hanger个。每一个都是一个项目,但要计算它们,我需要删除它们的其他每一个。

所以最后,文件会这样:

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :

我必须在不指定确切名称的情况下执行此操作,即在sedBusiness Card出现时我无法专门运行Door Hanger。我只需要删除包含任何相似之处的所有行,而不仅仅是完全重复。

由于

2 个答案:

答案 0 :(得分:1)

使用awk你可以做到:

awk -F":" '$1!=k{print $0}{k=$1}' file.txt

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :

在哪里测试第一个字段是否等于上一行中的字段。如果它等于什么都不做,只需保存(k=$1),如果不能,则打印该行。

这可以缩短为:

awk -F: '!seen[$1]++' file.txt

(向JID和格伦杰克曼致谢)

或者,如果你有固定的列数,你可以做到:

rev file.txt | uniq -f 17 | rev

你反转文件的每一行并跳过第一列17以在最后一行(实际上是第一列)上应用uniq,然后反向返回。但是这里不太方便,因为你没有相同数量的列。

HTH

答案 1 :(得分:0)

根据您的评论,一个简单的方法是:

cat filename | awk -F ":" '{print $1}' | sort | uniq