我正在尝试使用awk
将file
(跳过标题)拆分为8-column
或6-column
输出。我不确定我是否进行了正确的拆分,因为我需要先$2
再将:
拆分为-
。根据情况,每个awk
的期望输出低于一个或另一个。谢谢你:)。
档案'制表符分隔。
Gene Position Strand
SMARCB1 22:24133967-24133967 +
RB1 13:49037865-49037865 -
SMARCB1 22:24176357-24176357 +
AWK
awk -F'\t' -v OFS="\t" 'NR>1{split($2,a,":"); print a[1],a[2],a[3],"chr"$2,"0",$3,"GENE_ID="$1}'
8列所需输出 tab-delimited
chr22 24133967 24133967 chr22:24133967-24133967 0 + . GENE_ID=SMARCB1
chr13 49037865 49037865 chr13:49037865-49037865 0 - . GENE_ID=RB1
chr22 24176357 24176357 chr22:24176357-24176357 0 + . GENE_ID=SMARCB1
AWK
awk -F'\t' -v OFS="\t" 'NR>1{split($2,a,":"); print a[1],a[2],a[3],"chr"$2,".",$1,}'
6列所需输出 tab-delimited
chr22 24133967 24133967 chr22:24133967-24133967 . SMARCB1
chr13 49037865 49037865 chr13:49037865-49037865 . RB1
chr22 24176357 24176357 chr22:24176357-24176357 . SMARCB1
答案 0 :(得分:2)
扩展方法:
对于 6 - 列输出:
awk -v c=6 'BEGIN{ FS=OFS="\t" }NR>1{ split($2,a,":|-"); k="chr";
printf("%s\t%d\t%d\t%s\t",k a[1],a[2],a[3],k $2);
if (c==6) print ".",$1; else print "0",$3,".","GENE_ID="$1 }' file
输出:
chr22 24133967 24133967 chr22:24133967-24133967 . SMARCB1
chr13 49037865 49037865 chr13:49037865-49037865 . RB1
chr22 24176357 24176357 chr22:24176357-24176357 . SMARCB1
对于 8 - 列输出(通过传递-v c=<number>
(列)变量):
awk -v c=8 'BEGIN{ FS=OFS="\t" }NR>1{ split($2,a,":|-"); k="chr";
printf("%s\t%d\t%d\t%s\t",k a[1],a[2],a[3],k $2);
if (c==6) print ".",$1; else print "0",$3,".","GENE_ID="$1 }' file
输出:
chr22 24133967 24133967 chr22:24133967-24133967 0 + . GENE_ID=SMARCB1
chr13 49037865 49037865 chr13:49037865-49037865 0 - . GENE_ID=RB1
chr22 24176357 24176357 chr22:24176357-24176357 0 + . GENE_ID=SMARCB1