如何使用batch,sed或awk搜索重复数据? 目标是从data.txt文件中删除重复的“Changelist:XXXXX”条目。 我有点卡住,有人能帮助我吗?
请查看output.txt以获取所需的输出。
data.txt中
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================
output.txt的
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================
glen's output.txt
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
====================================$sep
答案 0 :(得分:2)
这实际上是awk
的一个非常常见的任务sep='====================================\n'
awk -F'\n' -v RS="$sep" -v ORS="$sep" '!seen[$1]++' data.txt > output.txt
在这里,我们使用$sep
作为awk 记录分隔符来阅读段落,将换行符作为字段分隔符
!seen[$1]++
是一个表达式,仅对遇到此特定字段1的第一条记录为true。由于未给出任何操作,因此默认操作是打印当前记录,并附加输出记录分隔符。