我在文本文件中有2列。我想打印与重复列关联的所有不同值。
VZTFARATGJBBCEGIM01 RGROUP-GIMAGES
VZTFARATGJBFFEGIM01 RGROUP-GIMAGES
VZTFARATGJBSTEGIM01 RGROUP-GIMAGES
VZTFARATGJBBCE024701 RGROUP-ENV0247
VZTFARATGJBFFE024701 RGROUP-ENV0247
VZTFARATGJBSTE024701 RGROUP-ENV0247
VZTFARATGOD11E024701 RGROUP-ENV0247
GROUP-ENV0247
VZTFARATGJBBCE024701
VZTFARATGJBFFE024701
VZTFARATGJBSTE024701
VZTFARATGOD11E024701
GROUP-GIMAGES
VZTFARATGAWSTEGIM01
VZTFARATGENTFEGIM01
VZTFARATGJBBCEGIM01
VZTFARATGJBFFEGIM01
答案 0 :(得分:1)
这是awk中的解决方案。
awk -F'[ ]' '{ b[$2]=b[$2] $1 "\n" } END { for (c in b) { print c; print b[c] }}' test.txt
其中test.txt包含以下值:
VZTFARATGJBBCEGIM01 RGROUP-GIMAGES
VZTFARATGJBFFEGIM01 RGROUP-GIMAGES
VZTFARATGJBSTEGIM01 RGROUP-GIMAGES
VZTFARATGJBBCE024701 RGROUP-ENV0247
VZTFARATGJBFFE024701 RGROUP-ENV0247
VZTFARATGJBSTE024701 RGROUP-ENV0247
VZTFARATGOD11E024701 RGROUP-ENV0247
输出看起来像:
RGROUP-ENV0247
VZTFARATGJBBCE024701
VZTFARATGJBFFE024701
VZTFARATGJBSTE024701
VZTFARATGOD11E024701
RGROUP-GIMAGES
VZTFARATGJBBCEGIM01
VZTFARATGJBFFEGIM01
VZTFARATGJBSTEGIM01
及其工作方式:
awk -F'[ ]' // deliminate on empty space
'{
// add value to associated array and append with newline
b[$2]=b[$2] $1 "\n"
}
END {
// print out each 'key', and their respective values
for (c in b) {
print c; print b[c]
}
}'
test.txt // file to read from
答案 1 :(得分:1)
使用GNU awk和2D数组清除$1
中的重复项:
$ awk '{
a[$2][$1] # hash to a. This weeds out duplicates in $1
}
END {
for(i in a) { # all groups
print i # output name
for(j in a[i]) # all group members
print j # output member
print "" # empty line after each group
}
}' file
输出:
RGROUP-ENV0247
VZTFARATGOD11E024701
VZTFARATGJBSTE024701
VZTFARATGJBBCE024701
VZTFARATGJBFFE024701
RGROUP-GIMAGES
VZTFARATGJBBCEGIM01
VZTFARATGJBSTEGIM01
VZTFARATGJBFFEGIM01
另一个用于非GNU awks。使用match
测试a[$2]
中是否存在重复项:
$ awk '
{
if(!match(a[$2],"(^|\n)" $1 "($|\n)"))
a[$2]=a[$2] "\n" $1
}
END {
for(i in a) {
print i a[i]
print ""
}
}
最后会有一个空行。