在下面awk
我尝试将:p.=
添加到每个$7
,前提是其中包含/NM/
模式。如果NM
中只有一个$7
,就像第2行一样,下面似乎就是这样。但如果NM
中有多个$7
,就像第3行一样,那么{ {1}}仅添加到最后一个。 :p.=
用于分隔字段中的多个;
。我添加了评论,但不确定我不做什么,那是必要的。谢谢你:)。
输入 NM
tab-delimited
AWK
R_Index Chr Start End Ref Alt Detail.refGene Gene.refGene
1 chr1 948846 948846 - A dist=1 ISG15
2 chr1 948870 948870 C G NM_005101:c.-84C>G ISG15
3 chr1 948921 948921 T C NM_005101:c.-33T>C;NM_005101:c.-84C>G ISG15
4 chr1 949654 949654 A G . ISG15
当前输出 awk '
BEGIN { FS=OFS="\t" } # define FS and OFS as tab and start processing
$7 ~ /NM/ { # look for pattern NM in $7
# split $7 by ";" and cycle through them
i=split($7,NM,";")
for (n=1; n<=i; n++) {
sub("$", ":p=", $7) # add :p. to end off each $7 before the ;
} # close block
}1' input # define input file
tab-delimited
所需的输出 R_Index Chr Start End Ref Alt Detail.refGene Gene.refGene
1 chr1 948846 948846 - A dist=1 ISG15
2 chr1 948870 948870 C G NM_005101:c.-84C>G:p.= ISG15
3 chr1 948921 948921 T C NM_005101:c.-33T>C;NM_005101:c.-84C>G:p.=p.= ISG15
4 chr1 949654 949654 A G . ISG15
tab-delimited
答案 0 :(得分:2)
替换它:
i=split($7,NM,";")
for (n=1; n<=i; n++) {
sub("$", ":p=", $7) # add :p. to end off each $7 before the ;
}
用这个:
out=""
i=split($7,NM,/;/)
for (n=1; n<=i; n++) {
sub(/$/, ":p=", NM[i]) # add :p. to end off each NM[i] before the ;
out = (out=="" ? "" : out";") NM[i]
}
$7 = out