此脚本从第一列中查找重复条目,并从第二列组中打印条目。我想知道脚本是如何实现的。
awk '{c[$1]++; k[$1]=k[$1] " " $2} END {for (i in c) {if (c[i]>1) print k[i]}}'
答案 0 :(得分:6)
{
c[$1]++ # count occurances of first field entries
k[$1]=k[$1] " " $2 # catenate second fields for recurring entries
# k[$1]=k[$1] $2 " " # this way output'd look better
}
END { # after counting and catenating
for (i in c) { # go thru all entries
if (c[i]>1) # and print the catenated second fields for those
print k[i] # recurring first fields
}
}
例如:
key1 data1
key1 data2
key2 data3
会产生输出:
data1 data2
答案 1 :(得分:2)
如果谁写了它只是使用了有意义的变量名称和缩进我打赌你甚至不必问:
awk '
{
count[$1]++
values[$1] = values[$1] " " $2
}
END {
for (key in count) {
if (count[key] > 1) {
print values[key]
}
}
}
'
用三元表达式可以更好地编写:
awk '
{ values[$1] = (count[$1]++ ? values[$1] " " : "") $2 }
END {
for (key in count) {
if (count[key] > 1) {
print values[key]
}
}
}
'
要避免使用前导或尾随空白,还可以进行其他一些小的改进。