在制表符分隔文件

时间:2018-05-09 18:22:24

标签: awk tab-delimited

我想将第7列(header- otherinfo)拆分为“:”,然后将第二列之后的第四,第六和第七个字符粘贴为具有不同标题的单个列。

输入文件将具有多行

Chr Start   End Ref Alt Func.   otherinfo
1   21  32  T   C   int 0/1:71:67:66:45:21:31.82%:7.1741E-8:33:34:45:0:21:0
2   22  31  T   C   int 0/1:77:45:44:22:21:48.84%:1.8298E-8:31:35:22:0:21:0
3   23  30  T   C   int 0/1:87:40:38:9:21:70%:1.7919E-9:32:36:9:0:21:0
4   24  29  G   T   int 0/1:68:23:23:3:15:65.22%:1.4655E-7:40:33:3:0:15:0
5   25  28  C   T   int 1/1:55:17:17:4:13:76.47%:2.5647E-6:30:21:4:0:13:0
6   26  27  T   C   int 1/1:60:15:15:2:13:86.67%:8.7675E-7:38:24:2:0:13:0
7   27  26  C   T   int 0/1:181:1067:1067:1003:64:6%:6.9582E-19:39:39:1003:0:64:0
8   28  25  C   A   int 1/1:46:9:9:0:9:100%:2.0568E-5:0:38:0:0:9:0
9   29  24  T   A   int 0/1:255:356:356:170:186:52.25%:3.2158E-71:40:40:0:170:0:186
10  30  23  T   G   int 1/1:41:8:8:0:8:100%:7.77E-5:0:40:0:0:0:8
11  31  22  G   A   int 0/1:148:92:92:51:41:44.57%:1.387E-15:40:39:51:0:41:0
12  32  21  G   C   int 0/1:122:51:51:20:31:60.78%:5.6397E-13:36:35:20:0:31:0

而输出文件应该如下所示

Chr Start   RD  AD  Per End Ref Alt Func.
1   21  66  21  31.82%  32  T   C   int
2   22  44  21  48.84%  31  T   C   int
3   23  38  21  70% 30  T   C   int
4   24  23  15  65.22%  29  G   T   int
5   25  17  13  76.47%  28  C   T   int
6   26  15  13  86.67%  27  T   C   int
7   27  1067    64  6%  26  C   T   int
8   28  9   9   100%    25  C   A   int
9   29  356 186 52.25%  24  T   A   int
10  30  8   8   100%    23  T   G   int
11  31  92  41  44.57%  22  G   A   int
12  32  51  31  60.78%  21  G   C   int

我尝试使用awk进行拆分

awk 'BEGIN {OFS=FS="\t"} {gsub(/\:/,"\t",$7)}1' input.txt >> output.txt

并获得此输出

Chr Start   End Ref Alt Func.   otherinfo
    1   21  32  T   C   int 0/1:71:67:66:45:21:31.82%:7.1741E-8:33:34:45:0:21:0
    2   22  31  T   C   int 0/1:77:45:44:22:21:48.84%:1.8298E-8:31:35:22:0:21:0
    3   23  30  T   C   int 0/1:87:40:38:9:21:70%:1.7919E-9:32:36:9:0:21:0
    4   24  29  G   T   int 0/1:68:23:23:3:15:65.22%:1.4655E-7:40:33:3:0:15:0
    5   25  28  C   T   int 1/1:55:17:17:4:13:76.47%:2.5647E-6:30:21:4:0:13:0
    6   26  27  T   C   int 1/1:60:15:15:2:13:86.67%:8.7675E-7:38:24:2:0:13:0
    7   27  26  C   T   int 0/1:181:1067:1067:1003:64:6%:6.9582E-19:39:39:1003:0:64:0
    8   28  25  C   A   int 1/1:46:9:9:0:9:100%:2.0568E-5:0:38:0:0:9:0
    9   29  24  T   A   int 0/1:255:356:356:170:186:52.25%:3.2158E-71:40:40:0:170:0:186
    10  30  23  T   G   int 1/1:41:8:8:0:8:100%:7.77E-5:0:40:0:0:0:8
    11  31  22  G   A   int 0/1:148:92:92:51:41:44.57%:1.387E-15:40:39:51:0:41:0
    12  32  21  G   C   int 0/1:122:51:51:20:31:60.78%:5.6397E-13:36:35:20:0:31:0

如果我能做到,请告诉我?

提前致谢

2 个答案:

答案 0 :(得分:0)

关注awk可能对您有帮助。

awk 'FNR==1{print "Chr\tStart\tRD\tAD\tPer\tEnd\tRef\tAlt\tFunc.";next}{split($NF,array,":");$2=$2 OFS array[4] OFS array[6] OFS array[7];$NF=""} 1' OFS="\t"  Input_file

请将awk更改为awk -F"\t"以获取TAB分隔符,并将Input_file更改为OFS="\t" Input_file以获取上述代码中的输出TAB分隔符。另外,要将输出输出到输出文件中,也请在上面的代码末尾使用> output_file

现在也添加非单线形式的解决方案。

awk '
FNR==1{
  print "Chr\tStart\tRD\tAD\tPer\tEnd\tRef\tAlt\tFunc.";
  next}
{
  split($NF,array,":");
  $2=$2 OFS array[4] OFS array[6] OFS array[7];
  $NF=""}
1
' OFS="\t"   Input_file

答案 1 :(得分:0)

此...

$ awk 'NR==1 {$2=$2 FS "RD AD Per"; NF--} 
       NR>1  {split($7,a,":"); NF--; 
              $2=$2 FS a[4] FS a[6] FS a[7]}1' file | column -t

Chr  Start  RD    AD   Per     End  Ref  Alt  Func.
1    21     66    21   31.82%  32   T    C    int
2    22     44    21   48.84%  31   T    C    int
3    23     38    21   70%     30   T    C    int
4    24     23    15   65.22%  29   G    T    int
5    25     17    13   76.47%  28   C    T    int
6    26     15    13   86.67%  27   T    C    int
7    27     1067  64   6%      26   C    T    int
8    28     9     9    100%    25   C    A    int
9    29     356   186  52.25%  24   T    A    int
10   30     8     8    100%    23   T    G    int
11   31     92    41   44.57%  22   G    A    int
12   32     51    31   60.78%  21   G    C    int