按列分隔gawk标题

时间:2015-12-24 17:53:39

标签: gawk

我正在尝试使用gawk将标题分成3个字段,但似乎无法获得所需的结果:$1是目标列,$2是Gene | GC列,$3是平均列。

GAWK

gawk  '{sub(/-[0-9]+/,"",$2); ar[$2]=$0}
        END{n = asort(ar)
                 printf "%-8s%8s%8s\n", "Target", "Gene|GC", "Average Depth" 
            for (i = 1; i <= n; i++)
                 print ar[i]}' OFS='\t' file

输入

 chr2:198299650-198299769 SF3B1-823|gc=51.3 143.1
 chr17:42153038-42153421 G6PC3-1981|gc=61.6 406.7
 chr13:32903545-32903664 BRCA2-318|gc=27.7 39.6
 chr17:56811469-56811593 RAD51C-2465|gc=44.4 228.5   

当前输出

TargetGene|GCAverage Depth
chr10:79793602-79793721 RPS24|gc=59.7   150.3
chr10:79795083-79795202 RPS24|gc=41.2   111.4
chr10:79797665-79797784 RPS24|gc=37 69.8
chr10:79799902-79800021 RPS24|gc=39.5   134.5

所需的输出

Target                  Gene|GC         Average Depth
chr10:79793602-79793721 RPS24|gc=59.7   150.3
chr10:79795083-79795202 RPS24|gc=41.2   111.4
chr10:79797665-79797784 RPS24|gc=37 69.8

1 个答案:

答案 0 :(得分:0)

看起来我需要的只是:

gawk  '{sub(/-[0-9]+/,"",$2); ar[$2]=$0}
        END{n = asort(ar)
                 print "Target","Gene|GC","Average Depth"
            for (i = 1; i <= n; i++)
                 print ar[i]}' OFS='\t' file

不确定它是否是最佳方式,但输出正常。谢谢你:)。