我想使用awk将文件的第一列拆分为"("并计算拆分命令的每个第二个变量的出现次数。
cluster1(2 genes, 2 taxa): column2 column 3
cluster1(2 genes, 2 taxa): column2 column 3
cluster1(3 genes, 2 taxa): column2 column 3
cluster1(3 genes, 2 taxa): column2 column 3
cluster1(4 genes, 2 taxa): column2 column 3
所以我的输出将是
2 genes, 2 taxa = 2
3 genes, 2 taxa = 2
4 genes, 2 taxa = 1
感谢您的帮助, 凯特
答案 0 :(得分:0)
$ awk -F '[()]' '{arr[$2]++} END{for(i in arr) print i " = " arr[i]}' data
4 genes, 2 taxa = 1
3 genes, 2 taxa = 2
2 genes, 2 taxa = 2
或使用uniq
计算的管道:
$ grep -oP '(?<=\().*(?=\))' data | uniq -c | awk '{print $2,$3,$4,$5 " =",$1}'
2 genes, 2 taxa = 2
3 genes, 2 taxa = 2
4 genes, 2 taxa = 1