我有一个包含28列的大文件,包含3个不同的代码(0 / 0,1 / 1和0/1),我想将其转换为单词。这个文件有数百万行,每行都用" Chr"
开头Chr10_102 T G 999 DP 38 DP4 37 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/1 0/0 0/1 0/0 0/1 0/0 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/1 0/0 0/1 0/0 0/0
Chr1_111 C T 999 DP 37 DP4 37 0/1 1/1 0/0 0/1 0/1 0/1 0/1 0/1 0/0 0/1 0/1 0/0 0/0 0/1 1/1 1/1 0/1 0/1 0/0 1/1 0/0 0/0 0/1 0/1 0/1 0/1 1/1 0/1 ...
我想转换28列和所有行中的每一行中的代码,如下所示:
0/0
至no_variant
1/1
至homo
0/1
至het
怎么做?我之前转换过,但我只有一个列有2个代码(0/1和1/1),现在我有28列要转换和3个代码,我用过
awk '{if ($9=="0/1") {print $0,"het"} else{print $0}}' | awk '{if ($9=="1/1") {print $0,"hom"} else{print $0}}'
非常感谢
克拉丽莎
答案 0 :(得分:2)
sed 's|0/0|no_variant|g; s|1/1|homo|g; s|0/1|het|g' file
作为awk,那将是
awk '{gsub("0/0","no_variant"); gsub("1/1","homo"); gsub("0/1","het")} 1' file
如果由于某种原因需要逐列,请使用for循环:
awk '
BEGIN {c["0/0"] = "no_variant"; c["0/1"] = "het"; c["1/1"] = "homo"}
{for (n=9; n<=NF; n++) {$n = c[$n]}; print}
' file