我正在尝试使用gawk
将标题分成3个字段,但似乎无法获得所需的结果:$1
是目标列,$2
是Gene | GC列,$3
是平均列。
GAWK
gawk '{sub(/-[0-9]+/,"",$2); ar[$2]=$0}
END{n = asort(ar)
printf "%-8s%8s%8s\n", "Target", "Gene|GC", "Average Depth"
for (i = 1; i <= n; i++)
print ar[i]}' OFS='\t' file
输入
chr2:198299650-198299769 SF3B1-823|gc=51.3 143.1
chr17:42153038-42153421 G6PC3-1981|gc=61.6 406.7
chr13:32903545-32903664 BRCA2-318|gc=27.7 39.6
chr17:56811469-56811593 RAD51C-2465|gc=44.4 228.5
当前输出
TargetGene|GCAverage Depth
chr10:79793602-79793721 RPS24|gc=59.7 150.3
chr10:79795083-79795202 RPS24|gc=41.2 111.4
chr10:79797665-79797784 RPS24|gc=37 69.8
chr10:79799902-79800021 RPS24|gc=39.5 134.5
所需的输出
Target Gene|GC Average Depth
chr10:79793602-79793721 RPS24|gc=59.7 150.3
chr10:79795083-79795202 RPS24|gc=41.2 111.4
chr10:79797665-79797784 RPS24|gc=37 69.8
答案 0 :(得分:0)
gawk '{sub(/-[0-9]+/,"",$2); ar[$2]=$0}
END{n = asort(ar)
print "Target","Gene|GC","Average Depth"
for (i = 1; i <= n; i++)
print ar[i]}' OFS='\t' file
不确定它是否是最佳方式,但输出正常。谢谢你:)。