在第二行出现分隔符之后,如何删除所有行和列中的分隔符和后续字符串

时间:2019-05-21 13:36:17

标签: bash shell awk sed

我有一张大桌子(数百万行,几百列,制表符分隔),前三列如下:

GT:DS:GP    0|0:0.181:0.827,0.165,0.008 0|0:0.181:0.827,0.165,0.008 0|0:0.181:0.827,0.165,0.008
GT:DS:GP    0|0:0.109:0.894,0.103,0.003 0|0:0.109:0.894,0.103,0.003 0|0:0.109:0.894,0.103,0.003
GT:DS:GP    0|0:0.004:0.996,0.004,0.000 0|0:0.004:0.996,0.004,0.000 0|0:0.004:0.996,0.004,0.000
GT:DS:GP    0|0:0.117:0.886,0.110,0.003 0|0:0.117:0.886,0.110,0.003 0|0:0.117:0.886,0.110,0.003

所有其余列看起来像第2列和第3列。我需要一个基于第一个文件的新文件,而第二个冒号(:)之后没有文字。输出应如下所示:

GT:DS   0|0:0.181   0|0:0.181   0|0:0.181
GT:DS   0|0:0.109   0|0:0.109   0|0:0.109
GT:DS   0|0:0.004   0|0:0.004   0|0:0.004
GT:DS   0|0:0.117   0|0:0.117   0|0:0.117

我觉得这可能与我在this post中发现的内容有些相似,但是显然exit命令告诉它在第一次出现之后停止,因此它不适用于多次出现(在几行中) /列)...

awk -v RS=':' -v ORS=':' 'NR==1{print} NR==2{print; printf"\n";exit}' input > output

此失败尝试的输出是:

GT:DS:

谢谢您的帮助!

1 个答案:

答案 0 :(得分:3)

$ sed 's/\([^:]*:[^:]*\):[^:\t]*/\1/g' file
GT:DS   0|0:0.181       0|0:0.181       0|0:0.181
GT:DS   0|0:0.109       0|0:0.109       0|0:0.109
GT:DS   0|0:0.004       0|0:0.004       0|0:0.004
GT:DS   0|0:0.117       0|0:0.117       0|0:0.117