要在同一字段中使用多个分隔符分割输入制表符分隔文件的awk

时间:2017-06-22 17:55:41

标签: awk

我正在尝试使用awkfile(跳过标题)拆分为8-column6-column输出。我不确定我是否进行了正确的拆分,因为我需要先$2再将:拆分为-。根据情况,每个awk的期望输出低于一个或另一个。谢谢你:)。

档案'制表符分隔。

Gene    Position    Strand
SMARCB1 22:24133967-24133967    +
RB1 13:49037865-49037865    -
SMARCB1 22:24176357-24176357    +

AWK

awk -F'\t' -v OFS="\t" 'NR>1{split($2,a,":"); print a[1],a[2],a[3],"chr"$2,"0",$3,"GENE_ID="$1}'

8列所需输出 tab-delimited

chr22   24133967    24133967    chr22:24133967-24133967 0   +   .   GENE_ID=SMARCB1
chr13   49037865    49037865    chr13:49037865-49037865 0   -   .   GENE_ID=RB1
chr22   24176357    24176357    chr22:24176357-24176357 0   +   .   GENE_ID=SMARCB1

AWK

awk -F'\t' -v OFS="\t" 'NR>1{split($2,a,":"); print a[1],a[2],a[3],"chr"$2,".",$1,}'

6列所需输出 tab-delimited

chr22   24133967    24133967    chr22:24133967-24133967 .   SMARCB1
chr13   49037865    49037865    chr13:49037865-49037865 .   RB1
chr22   24176357    24176357    chr22:24176357-24176357 .   SMARCB1

1 个答案:

答案 0 :(得分:2)

扩展方法:

对于 6 - 列输出:

awk -v c=6 'BEGIN{ FS=OFS="\t" }NR>1{ split($2,a,":|-"); k="chr"; 
             printf("%s\t%d\t%d\t%s\t",k a[1],a[2],a[3],k $2); 
             if (c==6) print ".",$1; else print "0",$3,".","GENE_ID="$1 }' file

输出:

chr22   24133967    24133967    chr22:24133967-24133967 .   SMARCB1
chr13   49037865    49037865    chr13:49037865-49037865 .   RB1
chr22   24176357    24176357    chr22:24176357-24176357 .   SMARCB1

对于 8 - 列输出(通过传递-v c=<number>(列)变量):

awk -v c=8 'BEGIN{ FS=OFS="\t" }NR>1{ split($2,a,":|-"); k="chr"; 
             printf("%s\t%d\t%d\t%s\t",k a[1],a[2],a[3],k $2); 
             if (c==6) print ".",$1; else print "0",$3,".","GENE_ID="$1 }' file

输出:

chr22   24133967    24133967    chr22:24133967-24133967 0   +   .   GENE_ID=SMARCB1
chr13   49037865    49037865    chr13:49037865-49037865 0   -   .   GENE_ID=RB1
chr22   24176357    24176357    chr22:24176357-24176357 0   +   .   GENE_ID=SMARCB1