我想将第7列(header- otherinfo)拆分为“:”,然后将第二列之后的第四,第六和第七个字符粘贴为具有不同标题的单个列。
输入文件将具有多行
Chr Start End Ref Alt Func. otherinfo
1 21 32 T C int 0/1:71:67:66:45:21:31.82%:7.1741E-8:33:34:45:0:21:0
2 22 31 T C int 0/1:77:45:44:22:21:48.84%:1.8298E-8:31:35:22:0:21:0
3 23 30 T C int 0/1:87:40:38:9:21:70%:1.7919E-9:32:36:9:0:21:0
4 24 29 G T int 0/1:68:23:23:3:15:65.22%:1.4655E-7:40:33:3:0:15:0
5 25 28 C T int 1/1:55:17:17:4:13:76.47%:2.5647E-6:30:21:4:0:13:0
6 26 27 T C int 1/1:60:15:15:2:13:86.67%:8.7675E-7:38:24:2:0:13:0
7 27 26 C T int 0/1:181:1067:1067:1003:64:6%:6.9582E-19:39:39:1003:0:64:0
8 28 25 C A int 1/1:46:9:9:0:9:100%:2.0568E-5:0:38:0:0:9:0
9 29 24 T A int 0/1:255:356:356:170:186:52.25%:3.2158E-71:40:40:0:170:0:186
10 30 23 T G int 1/1:41:8:8:0:8:100%:7.77E-5:0:40:0:0:0:8
11 31 22 G A int 0/1:148:92:92:51:41:44.57%:1.387E-15:40:39:51:0:41:0
12 32 21 G C int 0/1:122:51:51:20:31:60.78%:5.6397E-13:36:35:20:0:31:0
而输出文件应该如下所示
Chr Start RD AD Per End Ref Alt Func.
1 21 66 21 31.82% 32 T C int
2 22 44 21 48.84% 31 T C int
3 23 38 21 70% 30 T C int
4 24 23 15 65.22% 29 G T int
5 25 17 13 76.47% 28 C T int
6 26 15 13 86.67% 27 T C int
7 27 1067 64 6% 26 C T int
8 28 9 9 100% 25 C A int
9 29 356 186 52.25% 24 T A int
10 30 8 8 100% 23 T G int
11 31 92 41 44.57% 22 G A int
12 32 51 31 60.78% 21 G C int
我尝试使用awk进行拆分
awk 'BEGIN {OFS=FS="\t"} {gsub(/\:/,"\t",$7)}1' input.txt >> output.txt
并获得此输出
Chr Start End Ref Alt Func. otherinfo
1 21 32 T C int 0/1:71:67:66:45:21:31.82%:7.1741E-8:33:34:45:0:21:0
2 22 31 T C int 0/1:77:45:44:22:21:48.84%:1.8298E-8:31:35:22:0:21:0
3 23 30 T C int 0/1:87:40:38:9:21:70%:1.7919E-9:32:36:9:0:21:0
4 24 29 G T int 0/1:68:23:23:3:15:65.22%:1.4655E-7:40:33:3:0:15:0
5 25 28 C T int 1/1:55:17:17:4:13:76.47%:2.5647E-6:30:21:4:0:13:0
6 26 27 T C int 1/1:60:15:15:2:13:86.67%:8.7675E-7:38:24:2:0:13:0
7 27 26 C T int 0/1:181:1067:1067:1003:64:6%:6.9582E-19:39:39:1003:0:64:0
8 28 25 C A int 1/1:46:9:9:0:9:100%:2.0568E-5:0:38:0:0:9:0
9 29 24 T A int 0/1:255:356:356:170:186:52.25%:3.2158E-71:40:40:0:170:0:186
10 30 23 T G int 1/1:41:8:8:0:8:100%:7.77E-5:0:40:0:0:0:8
11 31 22 G A int 0/1:148:92:92:51:41:44.57%:1.387E-15:40:39:51:0:41:0
12 32 21 G C int 0/1:122:51:51:20:31:60.78%:5.6397E-13:36:35:20:0:31:0
如果我能做到,请告诉我?
提前致谢
答案 0 :(得分:0)
关注awk
可能对您有帮助。
awk 'FNR==1{print "Chr\tStart\tRD\tAD\tPer\tEnd\tRef\tAlt\tFunc.";next}{split($NF,array,":");$2=$2 OFS array[4] OFS array[6] OFS array[7];$NF=""} 1' OFS="\t" Input_file
请将awk
更改为awk -F"\t"
以获取TAB分隔符,并将Input_file
更改为OFS="\t" Input_file
以获取上述代码中的输出TAB分隔符。另外,要将输出输出到输出文件中,也请在上面的代码末尾使用> output_file
。
现在也添加非单线形式的解决方案。
awk '
FNR==1{
print "Chr\tStart\tRD\tAD\tPer\tEnd\tRef\tAlt\tFunc.";
next}
{
split($NF,array,":");
$2=$2 OFS array[4] OFS array[6] OFS array[7];
$NF=""}
1
' OFS="\t" Input_file
答案 1 :(得分:0)
此...
$ awk 'NR==1 {$2=$2 FS "RD AD Per"; NF--}
NR>1 {split($7,a,":"); NF--;
$2=$2 FS a[4] FS a[6] FS a[7]}1' file | column -t
Chr Start RD AD Per End Ref Alt Func.
1 21 66 21 31.82% 32 T C int
2 22 44 21 48.84% 31 T C int
3 23 38 21 70% 30 T C int
4 24 23 15 65.22% 29 G T int
5 25 17 13 76.47% 28 C T int
6 26 15 13 86.67% 27 T C int
7 27 1067 64 6% 26 C T int
8 28 9 9 100% 25 C A int
9 29 356 186 52.25% 24 T A int
10 30 8 8 100% 23 T G int
11 31 92 41 44.57% 22 G A int
12 32 51 31 60.78% 21 G C int