在下一列(一个文件)中逐字符串替换字符串

时间:2014-02-18 10:17:13

标签: perl bash parsing awk

我想替换“。”它位于第二列的中间,由第3列中的字符串组成。

输入文件(制表符分隔):

0   AAAAAAAAGTTT.TATAGTAATATA   T   x   HPNK_05032012_new.fna
1   AAAAAAACGACG.ATTTTACAATAC   C   x   HPNK_05032012_new.fna
2   AAAAAAAGCAGG.CATTATCGCTGG   G   x   HPNK_05032012_new.fna
3   AAAAAAAGGAAC.GTGGAACGTTGG   A   x   HPNK_05032012_new.fna
5   AAAAAACACAAC.ATTGAGCAACTT   A   x   HPNK_05032012_new.fna
6   AAAAAACACCCA.CTGTGAAAGAAA   T   x   HPNK_05032012_new.fna
9   AAAAAACGCCAA.GTCAGCTACAAA   C   x   HPNK_05032012_new.fna

期望的输出:

0   AAAAAAAAGTTTTTATAGTAATATA   T   x   HPNK_05032012_new.fna
1   AAAAAAACGACGCATTTTACAATAC   C   x   HPNK_05032012_new.fna
2   AAAAAAAGCAGGGCATTATCGCTGG   G   x   HPNK_05032012_new.fna
3   AAAAAAAGGAACAGTGGAACGTTGG   A   x   HPNK_05032012_new.fna
5   AAAAAACACAACAATTGAGCAACTT   A   x   HPNK_05032012_new.fna
6   AAAAAACACCCATCTGTGAAAGAAA   T   x   HPNK_05032012_new.fna
9   AAAAAACGCCAACGTCAGCTACAAA   C   x   HPNK_05032012_new.fna

3 个答案:

答案 0 :(得分:3)

使用:

$ awk '{sub("\.", $3, $2)}1' file
0 AAAAAAAAGTTTTTATAGTAATATA T x HPNK_05032012_new.fna
1 AAAAAAACGACGCATTTTACAATAC C x HPNK_05032012_new.fna
2 AAAAAAAGCAGGGCATTATCGCTGG G x HPNK_05032012_new.fna
3 AAAAAAAGGAACAGTGGAACGTTGG A x HPNK_05032012_new.fna
5 AAAAAACACAACAATTGAGCAACTT A x HPNK_05032012_new.fna
6 AAAAAACACCCATCTGTGAAAGAAA T x HPNK_05032012_new.fna
9 AAAAAACGCCAACGTCAGCTACAAA C x HPNK_05032012_new.fna

它基本上是使用.函数将sub()替换为第3个字段。然后1执行awk的默认行为:{print $0}

由于您的问题在列之间显示空格,因此我的输出只显示一个空格。如果您的输入使用选项卡,请将选项卡添加为字段分隔符:

awk 'BEGIN{FS=OFS="\t"} {sub("\.", $3, $2)}1' file

答案 1 :(得分:2)

perl -lane '$F[1] =~ s/[.]/$F[2]/; print "@F"' file

或更短,

perl -ape 's/[.]/$F[2]/' file

答案 2 :(得分:1)

使用awk,它将保留原始格式

awk '$19=$33' FS="" OFS="" file