Question

我有两个文件，（1）具有ID的Uniq_ID.txt，（2）具有ID和相应信息的Information.txt。每个ID可以在文件（2）中具有一个或多个相应的信息。我想用单行打印信息，用＆＃34;;＆＃34;如果ID在两个文件之间匹配。

（1）Uniq_ID.txt

a12
b13
c14
d15

（2）Information.txt

a12 AAA BBB
a12 ppp yyy
b13 CCC DDD
b13 GGG SSS
c14 HHH KKK
c14 JJJ OOO
d15 LLL LLL

预期产出

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

program.pl

#!/usr/bin/perl                                                                                          
#./program.pl Uniq_ID.txt Information.txt                                                                        
@aa=();
%data=();
@arrayname=();
$file1=$ARGV[0];
$file2=$ARGV[1];
open(FP1, $file1);
while($name1=<FP1>)
{
  chomp($name1);
  #collect the name according to the Uniq_ID                                                           
  $arrayname[$i]=$name1;
  $i++;
  open(FP2, $file2);
  while($info=<FP2>)
  {
    chomp($info);
    @aa=split(/\s/,$info);
    $name2=$aa[0];
    $seq=$aa[1];
    #if name in Uniq_ID is same with name in information.txt                                                    
    if($name1 =~ /^$name2$/)
    {                                                          
        #hash of arrays"                                                                          
        #put each line of information.txt into a Uniq_ID                                                     
        push @{$data{$arrayname[$i]}}, $info;
      }
  }
}
foreach (@arrayname){
print "$_:\t@{$data{$_}}\n";
}

我使用＆＃34; ./ program.pl Uniq_ID.txt Information.txt＆＃34;运行程序。但得到以下结果

a12:
b13:
c14:
d15:

请你告诉我我的节目有什么问题。感谢

Answer 1

所有必要的是将push每一行放到哈希的相应元素上。看起来像这样

use strict;
use warnings;
use autodie;

my @ids;

open my $fh, '<', 'Uniq_ID.txt';
push @ids, (split)[0] while <$fh>;

my %data;

open $fh, '<', 'Information.txt';
while (<$fh>) {
  chomp;
  my ($id) = split;
  push @{ $data{$id} }, $_;
}

for my $id (@ids) {
  printf "%s:%s\n", $id, join ';', @{ $data{$id} };
}

<强>输出

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

Answer 2

始终在每个脚本中加入use strict;和use warnings;。如果您进行任何文件处理，还要包括use autodie;。

通过一次处理每个文件，您的代码可以大大简化，如下所示：

use strict;
use warnings;
use autodie;

my ($id_file, $info_file) = @ARGV;

my %info;
open my $fh, '<', $info_file; # \"a12 AAA BBB\na12 ppp yyy\nb13 CCC DDD\nb13 GGG SSS\nc14 HHH KKK\nc14 JJJ OOO\nd15 LLL LLL";
while (<$fh>) {
    chomp;
    my ($id) = split;
    push @{$info{$id}}, $_;
}

open $fh, '<', $id_file; # \"a12\nb13\nc14\nd15";
while (<$fh>) {
    chomp;
    print "$_:" . join(';', @{$info{$_}}) . "\n";
}

输出：

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

Answer 3

这是一个使用perl的单行程序，可以从命令行运行：

perl -lne '
BEGIN {
    $x = pop; 
    push @{$h{$_->[0]}}, "@$_" for map [split], <>; 
    @ARGV = $x
}
print "$_:" . join ";" , @{ $h{$_} }' Information.txt Uniq_ID.txt

输出：

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

Answer 4

问题是，在$i将$name1放入该索引后，您需要增加$arrayname，然后再尝试在$i再次访问该元素，现在已经过去它。存储$i后增加$info，或改为使用push。

while($name1=<FP1>)
{
  chomp($name1);
  #collect the name according to the Uniq_ID                                                           
  $arrayname[$i]=$name1; # <-- You insert into the array at $i here 
  $i++;                  # <-- You increment $i here
  open(FP2, $file2);
  while($info=<FP2>)
  {
    chomp($info);
    @aa=split(/\s/,$info);
    $name2=$aa[0];
    $seq=$aa[1];
    #if name in Uniq_ID is same with name in information.txt                                                    
    if($name1 =~ /^$name2$/)
    {                                                          
        #hash of arrays"                                                                          
        #put each line of information.txt into a Uniq_ID                                                     
        push @{$data{$arrayname[$i]}}, $info; # <-- You access the element at $i here
      }
  }
  # <-- You should increment $i here (but use push instead)
}

如果ID相同，则使用数组的散列合并每条记录

4 个答案:

输出：