我的文件格式如下:
3313513003|481206309|3008296|2111|20150218000000|20150218000000|100|200|
3313513003|481206309|3008296|2111|20150218000000|20150219000000|000|010|
3313513000|481206306|4610335|47498|20150217000000|20150317000000|000|000|
3313513000|481206306|4610335|47498|20150217000000|20150317000000|000|010|
3313513000|481206306|4610335|47498|20141219000000|20150118000000|200|000|
3313513000|481206306|4610335|47498|20141219000000|20150118000000|200|010|
3313513000|481206306|4610335|47498|20141105000000|20141205000000|200|010|
3313513000|481206306|4610335|47498|20141105000000|20141205000000|200|000|
脚本应删除文件中存在的多个记录,标准如下:
1.对于相同的$ 1,$ 2,$ 3,$ 4,文件中存在多个记录。我们需要删除多个记录并仅打印最新的两个记录。
2.最新记录将通过比较具有相同$ 1,$ 2,$ 3,$ 4的记录中的$ 6来确定。
3.将有两条记录,其价值相同,为6美元,但价值为7美元,8美元。我们需要打印这些。
输出文件应如下所述:
3313513003|481206309|3008296|2111|20150218000000|20150218000000|100|200|
3313513003|481206309|3008296|2111|20150218000000|20150219000000|000|010|
There are no multiple records for this case.
3313513000|481206306|4610335|47498|20150217000000|20150317000000|000|000|
3313513000|481206306|4610335|47498|20150217000000|20150317000000|000|010|
The 6 records present in the File are compressed into 2 for this case.
答案 0 :(得分:1)
我仍然认为我不理解您想要应用的逻辑,但基于我认为您了解您的要求,这将是正确的方法:
$ cat tst.awk
BEGIN { FS = "|" }
{ key = $1 FS $2 FS $3 FS $4 }
key != prevKey || $6 == prev6
{ prevKey = key; prev6 = $6 }
$ awk -f tst.awk file
3313513003|481206309|3008296|2111|20150218000000|20150218000000|100|200|
3313513000|481206306|4610335|47498|20150217000000|20150317000000|000|000|
3313513000|481206306|4610335|47498|20150217000000|20150317000000|000|010|
3313513000|481206306|4610335|47498|20141219000000|20150118000000|200|010|
3313513000|481206306|4610335|47498|20141105000000|20141205000000|200|000|
希望你可以从那里开始工作。
答案 1 :(得分:1)
我对你的问题的解释可以通过
来回答cut -d"|" -f1-4 yourfile | sort -u | while read key; do
grep "^${key}" yourfile | sort -t"|" -u -k6,6 | tail -2
done