逐行比较

时间:2014-12-10 08:48:10

标签: arrays awk

我有一系列ID,如下所示。

20140201,ZTE_GENERIC_959,ZTE_GENERIC_959,PREPAID,ZTE_GENERIC_959,0,0,0,0,0,0,0,-120,0,0,0,0,0,0,0,0
20140201,ZTE_GENERIC_959,ZTE_GENERIC_959,PREPAID,ZTE_GENERIC_959,-100,568,0,0,0,0,0,-25,0,0,0,0,0,0,0,0
20140201,ZTE_GENERIC_988,ZTE_GENERIC_988,PREPAID,ZTE_GENERIC_988,-9,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0
20140201,ZTE_GENERIC_1010,ZTE_GENERIC_1010,PREPAID,ZTE_GENERIC_1010,0,0,0,0,0,0,0,-141,0,0,0,0,0,0,0,0
20140201,ZTE_GENERIC_959,ZTE_GENERIC_959,PREPAID,ZTE_GENERIC_959,0,0,0,0,0,0,-79,-67,0,0,0,0,0,0,0,0
20140201,ZTE_GENERIC_959,ZTE_GENERIC_959,PREPAID,ZTE_GENERIC_959,0,0,0,0,0,0,-474,146,0,0,0,0,0,0,0,0
20140201,ZTE_GENERIC_1219,ZTE_GENERIC_1219,HYBRIDE,ZTE_GENERIC_1219,0,0,0,0,0,0,0,0,-210,137,0,0,0,0,0,0
20140201,ZTE_GENERIC_1010,ZTE_GENERIC_1010,PREPAID,ZTE_GENERIC_1010,-127.5,85,0,0,0,0,0,0,0,0,0,0,0,0,0,0
20140201,ZTE_GENERIC_988,ZTE_GENERIC_988,PREPAID,ZTE_GENERIC_988,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
20140201,ZTE_GENERIC_1081,ZTE_GENERIC_1081,PREPAID,ZTE_GENERIC_1081,-126.4,71,0,0,0,0,-63.2,11,0,0,0,0,0,0,0,0
20140201,ZTE_GENERIC_959,ZTE_GENERIC_2_ZTE_GENERIC_959,PREPAID,ZTE_GENERIC_959,0,0,0,0,0,0,0,-142,0,0,0,0,0,0,0,0

我正在寻找一个awk脚本来查找此列表中的副本。我使用的脚本只考虑第一列,因此输出错误。我想要比较至少3或4列,所以结果是正确的

2 个答案:

答案 0 :(得分:0)

试试这个:

1)

awk 'a[$0]++' File

这将显示所有重复的行。

2)

awk '!a[$0]++' File

这将删除所有重复的行,如果这是你想要的。 这将检查整行......

我们使用计数器数组a,其中entire line为索引,并且第一次将计数增加1。下一次,条件将为false,因为与该行will not be zero对应的计数将因此而失败,并且将忽略重复的行。

答案 1 :(得分:0)

首先,你的问题不明确。 请在三栏或四栏比较中进行。 如果需要对完整的行进行comapred,那么您已经拥有A M D的解决方案,但稍有改动。为字段分隔符-F,

添加标记

如果是3栏:

awk -F, '!a[$1$2$3]' File

如果是4栏:

awk -F, '!a[$1$2$3$4]' File