AWK拆分列,计算第二个拆分变量的出现次数

时间:2016-05-24 10:34:48

标签: awk split

我想使用awk将文件的第一列拆分为"("并计算拆分命令的每个第二个变量的出现次数。

cluster1(2 genes, 2 taxa):  column2 column 3
cluster1(2 genes, 2 taxa):  column2 column 3
cluster1(3 genes, 2 taxa):  column2 column 3
cluster1(3 genes, 2 taxa):  column2 column 3
cluster1(4 genes, 2 taxa):  column2 column 3

所以我的输出将是

2 genes, 2 taxa = 2
3 genes, 2 taxa = 2
4 genes, 2 taxa = 1

感谢您的帮助, 凯特

1 个答案:

答案 0 :(得分:0)

$ awk -F '[()]' '{arr[$2]++} END{for(i in arr) print i " = " arr[i]}' data 
4 genes, 2 taxa = 1
3 genes, 2 taxa = 2
2 genes, 2 taxa = 2

或使用uniq计算的管道:

$ grep -oP '(?<=\().*(?=\))' data | uniq -c | awk '{print $2,$3,$4,$5 " =",$1}'
2 genes, 2 taxa = 2
3 genes, 2 taxa = 2
4 genes, 2 taxa = 1