使用bash,sed或awk删除重复数据

时间:2015-03-20 14:59:02

标签: bash awk sed

如何使用batch,sed或awk搜索重复数据? 目标是从data.txt文件中删除重复的“Changelist:XXXXX”条目。 我有点卡住,有人能帮助我吗?

请查看output.txt以获取所需的输出。

data.txt中

====================================
 Changelist: 808298
 Date: 2015/03/19
 Developer: A
 ShortDescr: Checking in the following graphics:

 CodeReview: 
 CodeReview: Result: @result___
 ====================================
 Changelist: 808273
 Date: 2015/03/19
 Developer: B
 ShortDescr: Hello

 CodeReview: Result: 
 ====================================
 Changelist: 808271
 Date: 2015/03/19
 Developer: C
 ShortDescr: HI

 CodeReview: 
 ====================================
 Changelist: 808298
 Date: 2015/03/19
 Developer: A
 ShortDescr: Checking in the following graphics:

 CodeReview: 
 CodeReview: Result: @result___
 ====================================
 Changelist: 808273
 Date: 2015/03/19
 Developer: B
 ShortDescr: Hello

 CodeReview: Result:  
 ====================================
  Changelist: 808277
 Date: 2015/03/19
 Developer: D
 ShortDescr: HEY

 CodeReview: 
 ====================================

output.txt的

    ====================================
     Changelist: 808298
     Date: 2015/03/19
     Developer: A
     ShortDescr: Checking in the following graphics:

     CodeReview: 
     CodeReview: Result: @result___
     ====================================
     Changelist: 808273
     Date: 2015/03/19
     Developer: B
     ShortDescr: Hello

     CodeReview: Result: 
     ====================================
     Changelist: 808271
     Date: 2015/03/19
     Developer: C
     ShortDescr: HI

     CodeReview: 
     ====================================
      Changelist: 808277
     Date: 2015/03/19
     Developer: D
     ShortDescr: HEY

     CodeReview: 
     ====================================


glen's output.txt

 ====================================
 Changelist: 808298
 Date: 2015/03/19
 Developer: A
 ShortDescr: Checking in the following graphics:

 CodeReview:
 ====================================
 Changelist: 808273
 Date: 2015/03/19
 Developer: B
 ShortDescr: Hello

 CodeReview:
 ====================================
 Changelist: 808271
 Date: 2015/03/19
 Developer: C
 ShortDescr: HI

 CodeReview: 
 ====================================
  Changelist: 808277
 Date: 2015/03/19
 Developer: D
 ShortDescr: HEY

 CodeReview: 
 ====================================
 Changelist: 808298
 Date: 2015/03/19
 Developer: A
 ShortDescr: Checking in the following graphics:

 CodeReview:
 ====================================$sep

1 个答案:

答案 0 :(得分:2)

这实际上是awk

的一个非常常见的任务
sep='====================================\n'
awk -F'\n' -v RS="$sep" -v ORS="$sep" '!seen[$1]++' data.txt > output.txt

在这里,我们使用$sep作为awk 记录分隔符来阅读段落,将换行符作为字段分隔符

!seen[$1]++是一个表达式,仅对遇到此特定字段1的第一条记录为true。由于未给出任何操作,因此默认操作是打印当前记录,并附加输出记录分隔符。