在UNIX中仅打印重复行的最后一行

时间:2017-09-06 15:06:28

标签: unix awk printing lines

我有这样的行文件:

20170824 08:00:00 21.1804 22.1807
20170824 08:00:00 21.1805 22.1806
20170824 08:00:00 21.1804 22.1807
20170824 08:00:00 21.1805 22.1806
20170824 08:00:00 21.1804 22.1806
20170824 08:00:01 21.1803 22.1806
20170824 08:00:01 21.1804 22.1806
20170824 08:00:01 21.1803 22.1807
20170824 08:00:01 21.1803 22.1806
20170824 08:00:01 21.1803 22.1806
20170824 08:00:02 21.1803 22.1805
20170824 08:00:02 21.1804 22.1808
20170824 08:00:02 21.1804 22.1806
20170824 08:00:02 21.1804 22.1807
20170824 08:00:03 21.1804 22.1808
20170824 08:00:03 21.1803 22.1807
20170824 08:00:03 21.1803 22.1805
20170824 08:00:03 21.1804 22.1806
20170824 08:00:05 21.1804 22.1807
20170824 08:00:05 21.1804 22.1808
20170824 08:00:05 21.1805 22.1806
20170824 08:00:05 21.1804 22.1807
20170824 08:00:05 21.1805 22.1806

我的目标是只打印重复次数的最后一行。 例如,输出应为:

20170824 08:00:00 21.1804 22.1806
20170824 08:00:01 21.1803 22.1806
20170824 08:00:02 21.1804 22.1807
20170824 08:00:03 21.1804 22.1806
20170824 08:00:05 21.1805 22.1806

我可以用一些字符拆分列,以便能够使用AWK。 对此有何想法?

4 个答案:

答案 0 :(得分:3)

使用-s的GNU排序(稳定排序):

$ tac file | sort -k1,2 -su
20170824 08:00:00     21.1804     22.1806
20170824 08:00:01     21.1803     22.1806
20170824 08:00:02     21.1804     22.1807
20170824 08:00:03     21.1804     22.1806
20170824 08:00:05     21.1805     22.1806

否则:

$ tac file | awk '!seen[$1,$2]++' | tac
20170824 08:00:00     21.1804     22.1806
20170824 08:00:01     21.1803     22.1806
20170824 08:00:02     21.1804     22.1807
20170824 08:00:03     21.1804     22.1806
20170824 08:00:05     21.1805     22.1806

答案 1 :(得分:1)

awk救援!

$ awk '{k=$1 FS $2} NR>1 && p!=k{print p0} {p0=$0; p=k} END{print}' file

20170824 08:00:00     21.1804     22.1806
20170824 08:00:01     21.1803     22.1806
20170824 08:00:02     21.1804     22.1807
20170824 08:00:03     21.1804     22.1806
20170824 08:00:05     21.1805     22.1806

<强>解释

设置密钥; 如果key不等于前一个键打印前一行,则从第二行开始; 保存当前行和当前键,以便在下一次迭代中使用; 打印最后一行。

答案 2 :(得分:0)

awk 解决方案:

awk '{k=$1 FS $2}!a[k]++ && r{print r}{ r=$0 }END{print}' file
  • k=$1 FS $2 - 构建连接datetime列的唯一键

  • !a[k]++ && r - 遇到不同的日期时间 !a[k]++以及之前处理过的行r - 打印最后捕获的行{{ 1}}来自上一节

输出:

r=$0

<强> ----------

使用GNU datamash 工具

Bonus 解决方案:

20170824 08:00:00 21.1804 22.1806
20170824 08:00:01 21.1803 22.1806
20170824 08:00:02 21.1804 22.1807
20170824 08:00:03 21.1804 22.1806
20170824 08:00:05 21.1805 22.1806
  • datamash -Wt' ' -g 1,2 last 3 last 4 <file - 第1和第2个字段的分组记录 date-time

  • g1,2 - 表示&#34;仅输出每个日期时间组中第3和第4个字段的最后一个条目&#34;

答案 3 :(得分:0)

谢谢。 所有答案对我都很有价值......

**问题现在来了,因为我实际上在最后(时间)和第一列中没有“完整”重复,这意味着它需要以某种方式按第一列分组并检查最后一列最后一个值... < / p>

有任何想法解决这个问题吗?**

21.1804 | 22.1807 | 20160324 | 16:00:09
21.1805 | 22.1806 | 20160324 | 16:00:11
21.1804 | 22.1807 | 20160324 | 16:00:25
21.1805 | 22.1806 | 20160324 | 16:00:28
21.1804 | 22.1806 | 20160324 | 16:00:47
21.1803 | 22.1806 | 20160324 | 16:00:55
21.1804 | 22.1806 | 20160324 | 16:01:03
21.1803 | 22.1807 | 20160324 | 16:01:07
21.1803 | 22.1806 | 20160324 | 16:01:25
21.1803 | 22.1806 | 20160324 | 16:01:26
21.1803 | 22.1805 | 20160324 | 16:01:40
21.1804 | 22.1808 | 20160324 | 16:01:47
21.1804 | 22.1806 | 20160324 | 16:01:55
21.1804 | 22.1807 | 20160324 | 16:02:04
21.1804 | 22.1808 | 20160324 | 16:02:07
21.1803 | 22.1807 | 20160324 | 16:02:44
21.1803 | 22.1805 | 20160324 | 16:02:56
21.1804 | 22.1806 | 20160324 | 16:03:07
21.1804 | 22.1807 | 20160324 | 16:03:14
21.1804 | 22.1808 | 20160324 | 16:03:24
21.1805 | 22.1806 | 20160324 | 16:03:46
21.1804 | 22.1807 | 20160324 | 16:03:55
21.1805 | 22.1806 | 20160324 | 16:04:03
21.1804 | 22.1807 | 20160324 | 16:04:27
21.1805 | 22.1806 | 20160324 | 16:04:28
21.1804 | 22.1807 | 20160324 | 16:04:49
21.1805 | 22.1806 | 20160324 | 16:04:17
21.1804 | 22.1806 | 20160324 | 16:05:01
21.1803 | 22.1806 | 20160324 | 16:05:03
21.1804 | 22.1806 | 20160324 | 16:05:06
21.1803 | 22.1807 | 20160324 | 16:05:11
21.1803 | 22.1806 | 20160324 | 16:05:15
21.1803 | 22.1806 | 20160324 | 16:05:24
21.1803 | 22.1805 | 20160324 | 16:06:18
21.1804 | 22.1808 | 20160324 | 16:06:24
21.1804 | 22.1806 | 20160324 | 16:06:36
21.1804 | 22.1807 | 20160324 | 16:06:40
21.1804 | 22.1808 | 20160324 | 16:06:56
21.1803 | 22.1807 | 20160324 | 16:07:00
21.1803 | 22.1805 | 20160324 | 16:07:07
21.1804 | 22.1806 | 20160324 | 16:07:22
21.1804 | 22.1807 | 20160324 | 16:07:25
21.1804 | 22.1808 | 20160324 | 16:08:15
21.1805 | 22.1806 | 20160324 | 16:08:27
21.1804 | 22.1807 | 20160324 | 16:08:39
21.1805 | 22.1806 | 20160324 | 16:09:11
21.1804 | 22.1807 | 20160324 | 16:09:25
21.1805 | 22.1806 | 20160324 | 16:09:25
21.1804 | 22.1807 | 20160324 | 16:09:38
21.1805 | 22.1806 | 20160324 | 16:09:39
21.1804 | 22.1806 | 20160324 | 16:09:47
21.1803 | 22.1806 | 20160324 | 16:09:55
21.1804 | 22.1806 | 20160324 | 16:09:56