如何计算多列中的唯一字符串,并仅使用awk
我的输入文件c.txt
:
US A one
IN A two
US B one
LK C one
US B two
US A three
IN A three
US B one
LK C two
US B three
US A one
IN A one
US B three
LK C three
US B two
US A two
IN A two
US B two
LK C three
US B two
US A one
IN A two
US B one
LK C one
US B two
US A three
IN A three
US B one
LK C two
US B three
US A one
IN A one
US B three
LK C three
US B two
US A two
IN A two
US B two
LK C three
US B two
US A one
IN A two
US B one
LK C one
US B two
US A three
IN A three
US B one
LK C two
US B three
US A one
IN A one
US B three
LK C three
US B two
US A two
IN A two
US B two
LK C three
US B two
US A one
IN A two
US B one
LK C one
US B two
US A three
IN A three
US B one
LK C two
US B three
US A one
IN A one
US B three
LK C three
US B two
US A two
IN A two
US B two
LK C three
US B two
US A one
IN A two
US B one
LK C one
US B two
US A three
IN A three
US B one
LK C two
US B three
US A one
IN A one
US B three
LK C three
US B two
US A two
IN A two
US B two
LK C three
US B two
我能够通过3个命令单独完成此操作,可以通过单个命令获得所有输出
awk '{a[$1]++}END{for (i in a)print i,a[i]}' c.txt
awk '{a[$1" "$2]++}END{for (i in a)print i,a[i]}' c.txt
awk '{a[$1" "$2" "$3]++}END{for (i in a)print i,a[i]}' c.txt
我想要的输出应该是:
IN 20 A 20 one 5
IN 20 A 20 three 5
IN 20 A 20 two 10
LK 20 C 20 one 5
LK 20 C 20 three 10
LK 20 C 20 two 5
US 60 A 20 one 10
US 60 A 20 three 5
US 60 A 20 two 5
US 60 B 40 one 10
US 60 B 40 three 10
US 60 B 40 two 20
第二列是输入文件第一列的总uniq值。
第4列是输入文件的第1列和第2列的总uniq值。
第6列是输入文件的第1列,第2列和第3列的总uniq值。
答案 0 :(得分:3)
使用GNU awk
,您可以使用以下脚本:
$ cat count.awk
{
lines[$0]=$0
count1[$1]++
count2[$1,$2]++
count3[$1,$2,$3]++
}
END{
n = asorti(lines)
for (i=1;i<=n;i++) {
split(lines[i],field,FS)
total1 = count1[field[1]]
total2 = count2[field[1],field[2]]
total3 = count3[field[1],field[2],field[3]]
print field[1],total1,field[2],total2,field[3],total3
}
}
要在您的文件上运行脚本:
$ awk -f count.awk file
IN 20 A 20 one 5
IN 20 A 20 three 5
IN 20 A 20 two 10
LK 20 C 20 one 5
LK 20 C 20 three 10
LK 20 C 20 two 5
US 60 A 20 one 10
US 60 A 20 three 5
US 60 A 20 two 5
US 60 B 40 one 10
US 60 B 40 three 10
US 60 B 40 two 20
答案 1 :(得分:2)
试试这个awk one liner:
$ awk '{a[$1]++;b[$1,$2]++;c[$1,$2,$3]++}END{for (i in c) {split (i, d, SUBSEP); print d[1],a[d[1]],d[2],b[d[1],d[2]],d[3],c[d[1],d[2],d[3]] } }' file | sort
IN 20 A 20 one 5
IN 20 A 20 three 5
IN 20 A 20 two 10
LK 20 C 20 one 5
LK 20 C 20 three 10
LK 20 C 20 two 5
US 60 A 20 one 10
US 60 A 20 three 5
US 60 A 20 two 5
US 60 B 40 one 10
US 60 B 40 three 10
US 60 B 40 two 20
或者以更易读的格式:
$ awk '
{
a[$1]++
b[$1,$2]++
c[$1,$2,$3]++
}
END{
for (i in c) {
split (i, d, SUBSEP);
print d[1], a[d[1]],
d[2], b[d[1], d[2]],
d[3], c[d[1], d[2], d[3]]
}
}' file | sort