我有2个csv和文本文件,文件1有2列,其中一列具有基因ID,两列具有基因名称,文件2有许多列,其中部分字符串是基因ID,例如,基因ID(基因组)或伪基因id(基因组)。我想将文件1中的每个基因ID与文件2中的每个基因ID进行比较,并用文件3中打印的文件1中的基因名称替换文件2中的基因ID。
文件1;
SPAR5_0024, coA binding domain protein
SPAR5_0025, hypothetical protein
SPAR5_0026, hypothetical protein
文件2;
SPAR5_0024(72.AFAX01.1.gb) SPAR5_0026(72.AFAX01.1.gbff) SPAR5_0025(72.AFAX01.1.gbff)
所需的输出(文件3);
coA binding domain protein(72.AFAX01.1.gb) hypothetical protein(72.AFAX01.1.gbff) hypothetical protein(72.AFAX01.1.gbff)
使用我的代码获取一个空文件3
正在运行;
#!/usr/local/bin/perl -w
use strict;
use warnings;
my $file1 = "annot.txt";
my $file2 = "orthomcl.csv";
my $file3 = "combi.csv";
open (FILE1,"$file1") || die;
open (FILE2,"$file2") || die;
open (FILE3,">$file3") || die;
my @file1 = <FILE1>;
my @file2 = <FILE2>;
my %file1;
while ( my $value = <FILE1> ) {
chomp $value;
my @file1 = split /\s+/, $_;
$file1{$value} = 1;
}
my %file2;
while (my $value = <FILE2>) {
chomp $value;
my @file2 = split /\s+/, $_;
if ( $file1{ $value } ) {
$file2 = $file1{ $file2 };
print join( "\t" => @file2 ), $/;
}
}
close (FILE1);
close (FILE2);
close (FILE3);
所需的输出(文件3)
coA binding domain protein(72.AFAX01.1.gb) hypothetical protein(72.AFAX01.1.gbff) hypothetical protein(72.AFAX01.1.gbff)
答案 0 :(得分:0)
主要错误是
my @file1 = <FILE1>;
my @file2 = <FILE2>;
使用文件中的所有数据,因此无需读取任何内容
while ( my $value = <FILE1> ) {
和
while (my $value = <FILE2>) {
答案 1 :(得分:0)
下面是一个示例,您可以将注释从第一个文件annot.txt
插入第二个文件orthomcl.csv
:
use feature qw(say);
use strict;
use warnings;
{
my $map = read_annot();
my ($regex) = map {qr /$_/} join '|', map {quotemeta} keys %$map;
my $fn = 'orthomcl.csv';
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my $str = do { local $/; <$fh> };
close $fh;
$str =~ s/($regex)/$map->{$1}/ge;
save_combi( $str );
}
sub save_combi {
my ( $str ) = @_;
my $fn = 'combi.csv';
open ( my $fh, '>', $fn ) or die "Could not open file '$fn': $!";
print $fh $str;
close $fh;
say "Saved: '$fn'";
}
sub read_annot {
my $fn = 'annot.txt';
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my %map;
while (my $line = <$fh> ) {
chomp $line;
my ( $key, $value ) = $line =~ /^(\S+),\s+(.*)$/;
$value =~ s/\s+$//;
if (defined $key) {
$map{$key} = $value;
}
}
close $fh;
return \%map;
}