awk为字段

时间:2017-09-14 13:52:37

标签: awk

在下面awk我尝试将:p.=添加到每个$7,前提是其中包含/NM/模式。如果NM中只有一个$7,就像第2行一样,下面似乎就是这样。但如果NM中有多个$7,就像第3行一样,那么{ {1}}仅添加到最后一个。 :p.=用于分隔字段中的多个;。我添加了评论,但不确定我不做什么,那是必要的。谢谢你:)。

输入 NM

tab-delimited

AWK

R_Index Chr Start   End Ref Alt Detail.refGene  Gene.refGene
1   chr1    948846  948846  -   A   dist=1  ISG15
2   chr1    948870  948870  C   G   NM_005101:c.-84C>G  ISG15
3   chr1    948921  948921  T   C   NM_005101:c.-33T>C;NM_005101:c.-84C>G   ISG15
4   chr1    949654  949654  A   G   .   ISG15

当前输出 awk ' BEGIN { FS=OFS="\t" } # define FS and OFS as tab and start processing $7 ~ /NM/ { # look for pattern NM in $7 # split $7 by ";" and cycle through them i=split($7,NM,";") for (n=1; n<=i; n++) { sub("$", ":p=", $7) # add :p. to end off each $7 before the ; } # close block }1' input # define input file

tab-delimited

所需的输出 R_Index Chr Start End Ref Alt Detail.refGene Gene.refGene 1 chr1 948846 948846 - A dist=1 ISG15 2 chr1 948870 948870 C G NM_005101:c.-84C>G:p.= ISG15 3 chr1 948921 948921 T C NM_005101:c.-33T>C;NM_005101:c.-84C>G:p.=p.= ISG15 4 chr1 949654 949654 A G . ISG15

tab-delimited

1 个答案:

答案 0 :(得分:2)

替换它:

      i=split($7,NM,";")
         for (n=1; n<=i; n++) {
          sub("$", ":p=", $7)   # add :p. to end off each $7 before the ;
         }

用这个:

      out=""
      i=split($7,NM,/;/)
         for (n=1; n<=i; n++) {
          sub(/$/, ":p=", NM[i])   # add :p. to end off each NM[i] before the ;
          out = (out=="" ? "" : out";") NM[i]
         }
      $7 = out