打印具有唯一值的列

时间:2019-01-06 07:00:14

标签: awk sed

我在文本文件中有2列。我想打印与重复列关联的所有不同值。

VZTFARATGJBBCEGIM01 RGROUP-GIMAGES
VZTFARATGJBFFEGIM01 RGROUP-GIMAGES
VZTFARATGJBSTEGIM01 RGROUP-GIMAGES
VZTFARATGJBBCE024701 RGROUP-ENV0247
VZTFARATGJBFFE024701 RGROUP-ENV0247
VZTFARATGJBSTE024701 RGROUP-ENV0247
VZTFARATGOD11E024701 RGROUP-ENV0247
GROUP-ENV0247
VZTFARATGJBBCE024701
VZTFARATGJBFFE024701
VZTFARATGJBSTE024701
VZTFARATGOD11E024701

GROUP-GIMAGES
VZTFARATGAWSTEGIM01
VZTFARATGENTFEGIM01
VZTFARATGJBBCEGIM01
VZTFARATGJBFFEGIM01

2 个答案:

答案 0 :(得分:1)

这是awk中的解决方案。

awk -F'[ ]' '{ b[$2]=b[$2]  $1 "\n" } END { for (c in b) { print c; print b[c] }}' test.txt

其中test.txt包含以下值:

VZTFARATGJBBCEGIM01 RGROUP-GIMAGES 
VZTFARATGJBFFEGIM01 RGROUP-GIMAGES 
VZTFARATGJBSTEGIM01 RGROUP-GIMAGES 
VZTFARATGJBBCE024701 RGROUP-ENV0247 
VZTFARATGJBFFE024701 RGROUP-ENV0247 
VZTFARATGJBSTE024701 RGROUP-ENV0247 
VZTFARATGOD11E024701 RGROUP-ENV0247

输出看起来像:

RGROUP-ENV0247
VZTFARATGJBBCE024701
VZTFARATGJBFFE024701
VZTFARATGJBSTE024701
VZTFARATGOD11E024701

RGROUP-GIMAGES
VZTFARATGJBBCEGIM01
VZTFARATGJBFFEGIM01
VZTFARATGJBSTEGIM01

及其工作方式:

awk -F'[ ]'   // deliminate on empty space
'{ 
    // add value to associated array and append with newline
    b[$2]=b[$2]  $1 "\n"
} 
END { 
  // print out each 'key', and their respective values
  for (c in b) { 
    print c; print b[c] 
  }
}' 
test.txt      // file to read from 

答案 1 :(得分:1)

使用GNU awk和2D数组清除$1中的重复项:

$ awk '{
    a[$2][$1]           # hash to a. This weeds out duplicates in $1
}
END {
    for(i in a) {       # all groups
        print i         # output name
        for(j in a[i])  # all group members
            print j     # output member
        print ""        # empty line after each group
    }
}' file

输出:

RGROUP-ENV0247
VZTFARATGOD11E024701
VZTFARATGJBSTE024701
VZTFARATGJBBCE024701
VZTFARATGJBFFE024701

RGROUP-GIMAGES
VZTFARATGJBBCEGIM01
VZTFARATGJBSTEGIM01
VZTFARATGJBFFEGIM01

另一个用于非GNU awks。使用match测试a[$2]中是否存在重复项:

$ awk '
{
    if(!match(a[$2],"(^|\n)" $1 "($|\n)"))
    a[$2]=a[$2] "\n" $1
}
END {
    for(i in a) {
        print i a[i]
        print ""
    }
}

最后会有一个空行。