我正在使用带有很多列的制表符分隔文件(VCF文件enter link description here)(下面是一个小例子)
1 13979 S01_13979 C G . . PR GT ./. ./.
1 13980 S01_13980 G A . . PR GT ./. ./.
1 13986 S01_13986 G A . . PR GT ./. ./.
1 14023 S01_14023 G A . . PR GT 0/0 ./.
1 15671 S01_15671 A T . . PR GT 0/0 0/0
1 60519 S01_60519 A G . . PR GT 0/0 0/0
1 60531 S01_60531 T C . . PR GT 0/0 0/0
1 63378 S01_63378 A G . . PR GT 1/1 ./.
1 96934 S01_96934 C T . . PR GT 0/0 0/0
1 96938 S01_96938 C T . . PR GT 0/0 0/0
在第一列(染色体名称)中,我具有从1到26的数字(例如1,2,... 25,26)。我想将HanXRQChr0前缀添加到1到9之间的数字,并将HanXRQChr前缀添加到10到26之间的数字。所有其他列中的值应保持不变。
现在,我尝试了一种sed
解决方案,但是输出并不完全正确(最后一个管道不起作用):
cat test.vcf | sed -r '/^[1-9]/ s/^[1-9]/HanXRQChr0&/' | sed -r '/^[1-9]/ s/^[0-9]{2}/HanXRQChr&/' > test-1.vcf
如何通过AWK
来做到这一点?我认为AWK
在我的情况下使用会更安全,直接更改文件的第一列即可。
答案 0 :(得分:2)
由于您未提供示例输入,因此以下是包含模拟数据的脚本
$ seq 1 3 30 | awk '1<=$1 && $1<=26 {$1=sprintf("HanXRQChr%02d",$1)}1'
HanXRQChr01
HanXRQChr04
HanXRQChr07
HanXRQChr10
HanXRQChr13
HanXRQChr16
HanXRQChr19
HanXRQChr22
HanXRQChr25
28
请注意,28会转义前缀逻辑。
为防止制表符分隔符转换为空格,请将BEGIN块添加到开头
$ awk 'BEGIN{FS=OFS="\t"} ...
答案 1 :(得分:2)
请您尝试以下。
awk -v first="HanXRQChr0" -v second="HanXRQChr" '
$1>=1 && $1<=9{
$1=first $1
}
$1>=10 && $1<=26{
$1=second $1
}
1' Input_file
您也可以根据需要更改名为first
的变量和second
的值。它将执行的操作将检查第一个字段的值是否在1到9之间,并将变量second
的值添加到前缀;如果第一个字段的值在10到26之间,则将first
变量的值添加到前缀
说明: 在此处也为上面的代码添加了说明。
awk -v first="HanXRQChr0" -v second="HanXRQChr" ' ##Creating variable named first and second and you could keep their values as per your need.
$1>=1 && $1<=9{ ##Checking condition when first field is greater than or equal to 1 and less than or equal to 9 here then do following.
$1=first $1 ##Re-creating the first field and adding variable first value before it here.
} ##closing this condition block here.
$1>=10 && $1<=26{ ##Checking condition here if 1st field is greater than or equal to 10 AND lesser than or equal to 26 then do following.
$1=second $1 ##Re-creating first field value and adding variable second value before $1 here.
} ##Closing this condition block here.
1 ##Mentioning 1 will be printing the line here.
' Input_file ##Mentioning Input_file name here.