awk在字段中计算和重命名符号

时间:2016-08-26 15:24:52

标签: awk

我正在count(参考)中尝试symbol $5( - ),并使用awk输出重命名的符号和计数。输入文件以制表符分隔,并且下面的awk已关闭,但输出的数据不正确并且我不知道如何修复它。谢谢你:)。

AWK

awk -F'\t' 'BEGIN {printf "Category\tCount\n" } $5 ~ /-/ {printf "indel" } {a[$5]++} END { for (i in a) {printf "%s\t\t%s\n",i , a[i] }}' input

输入

Index   Mutation Call   Start   End Ref Alt Func.refGene    Gene.refGene    ExonicFunc.refGene  Sanger
13  c.[1035-3T>C]+[1035-3T>C]   166170127   166170127   T   C   intronic    SCN2A       
16  c.[2994C>T]+[=] 166210776   166210776   C   T   exonic  SCN2A   synonymous SNV  
19  c.[4914T>A]+[4914T>A]   166245230   166245230   T   A   exonic  SCN2A   synonymous SNV  
20  c.[5109C>T]+[=] 166245425   166245425   C   T   exonic  SCN2A   synonymous SNV  
21  c.[5139C>T]+[=] 166848646   166848646   G   A   exonic  SCN1A   synonymous SNV  
22  c.3152_3153insAACCACT   166892841   166892841   -   AGTGGTT exonic  SCN1A   frameshift insertion    TP
23  c.2044-5delT    166898947   166898947   A   -   intronic    SCN1A       
25  c.1530_1531insA 166901684   166901684   -   T   exonic  SCN1A   frameshift insertion    FP 

当前输出

Category    Count
indelindelindelindel        5
A       4
C       7
Ref     1
-       4
G       2
T       6
TCCT        1

所需的输出

Category    Count
indel        2

1 个答案:

答案 0 :(得分:2)

你去......

$ awk -F'\t' '$5=="-"{count++} 
                  END{print "Category","Count"; 
                      print "indel",count}' file | 
  column -t

Category  Count
indel     2