我想将匹配哈希键的文件中的所有单词替换为相应的值。
$VAR1 = {
'asmbl_1' => 'TCONS_00000046',
'asmbl_2' => 'TCONS_00000014',
'asmbl_16' => 'MELO3C000012',
}
CM3.6.1_CONTIG30890 assembler transcript 187 1568 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:184317|asmbl_1";
CM3.6.1_CONTIG30890 assembler exon 187 251 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:184317|asmbl_1";
CM3.6.1_CONTIG30898 assembler exon 1339 2793 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:184318|asmbl_2";
CM3.6.1_CONTIG30890 assembler transcript 187 1568 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:184317|TCONS_00000046";
CM3.6.1_CONTIG30890 assembler exon 187 251 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:184317|TCONS_00000046";
CM3.6.1_CONTIG30898 assembler exon 1339 2793 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:184318|TCONS_00000014";
我正在寻找一种直接的方法,最好是在Perl中,因为我在Perl中编写脚本。
(这两种方法有什么区别?)
sed -i '/key/value/'
”。有点难看,我宁愿在Perl中做所有事情。答案 0 :(得分:3)
我喜欢一个很好的技巧,基本上是构建一个正则表达式并使用它来捕获和匹配你的正则表达式:
use strict;
use warnings;
my %replace = (
'asmbl_1' => 'TCONS_00000046',
'asmbl_2' => 'TCONS_00000014',
'asmbl_16' => 'MELO3C000012',
);
my $search = join( "|", map {quotemeta} sort { length ($b) <=> length ($a) } keys %replace );
$search = qr/\b($search)\b/;
while (<>) {
s/$search/$replace{$1}/g;
print;
}
这样的东西会产生所需的输出。 (钻石运营商从STDIN
读取内容或通过myscript.pl <some_File_To_process>
答案 1 :(得分:3)
这就是必要的
use strict;
use warnings;
my %map = (
asmbl_1 => 'TCONS_00000046',
asmbl_2 => 'TCONS_00000014',
asmbl_16 => 'MELO3C000012',
);
my $re = join '|', map quotemeta, keys %map;
while ( <DATA> ) {
s/\b($re)\b/$map{$1}/g;
print;
}
__DATA__
CM3.6.1_CONTIG30890 assembler transcript 187 1568 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:184317|asmbl_1";
CM3.6.1_CONTIG30890 assembler exon 187 251 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:184317|asmbl_1";
CM3.6.1_CONTIG30898 assembler exon 1339 2793 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:184318|asmbl_2";
CM3.6.1_CONTIG30890 assembler transcript 187 1568 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:184317|TCONS_00000046";
CM3.6.1_CONTIG30890 assembler exon 187 251 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:184317|TCONS_00000046";
CM3.6.1_CONTIG30898 assembler exon 1339 2793 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:184318|TCONS_00000014";