如果ID相同,则使用数组的散列合并每条记录

时间:2014-05-31 19:43:54

标签: perl

我有两个文件,(1)具有ID的Uniq_ID.txt,(2)具有ID和相应信息的Information.txt。每个ID可以在文件(2)中具有一个或多个相应的信息。我想用单行打印信息,用";"如果ID在两个文件之间匹配。

(1)Uniq_ID.txt

a12
b13
c14
d15

(2)Information.txt

a12 AAA BBB
a12 ppp yyy
b13 CCC DDD
b13 GGG SSS
c14 HHH KKK
c14 JJJ OOO
d15 LLL LLL

预期产出

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

program.pl

#!/usr/bin/perl                                                                                          
#./program.pl Uniq_ID.txt Information.txt                                                                        
@aa=();
%data=();
@arrayname=();
$file1=$ARGV[0];
$file2=$ARGV[1];
open(FP1, $file1);
while($name1=<FP1>)
{
  chomp($name1);
  #collect the name according to the Uniq_ID                                                           
  $arrayname[$i]=$name1;
  $i++;
  open(FP2, $file2);
  while($info=<FP2>)
  {
    chomp($info);
    @aa=split(/\s/,$info);
    $name2=$aa[0];
    $seq=$aa[1];
    #if name in Uniq_ID is same with name in information.txt                                                    
    if($name1 =~ /^$name2$/)
    {                                                          
        #hash of arrays"                                                                          
        #put each line of information.txt into a Uniq_ID                                                     
        push @{$data{$arrayname[$i]}}, $info;
      }
  }
}
foreach (@arrayname){
print "$_:\t@{$data{$_}}\n";
}

我使用&#34; ./ program.pl Uniq_ID.txt Information.txt&#34;运行程序。但得到以下结果

a12:
b13:
c14:
d15:

请你告诉我我的节目有什么问题。感谢

4 个答案:

答案 0 :(得分:2)

所有必要的是将push每一行放到哈希的相应元素上。看起来像这样

use strict;
use warnings;
use autodie;

my @ids;

open my $fh, '<', 'Uniq_ID.txt';
push @ids, (split)[0] while <$fh>;

my %data;

open $fh, '<', 'Information.txt';
while (<$fh>) {
  chomp;
  my ($id) = split;
  push @{ $data{$id} }, $_;
}

for my $id (@ids) {
  printf "%s:%s\n", $id, join ';', @{ $data{$id} };
}

<强>输出

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

答案 1 :(得分:2)

始终在每个脚本中加入use strict;use warnings;。如果您进行任何文件处理,还要包括use autodie;

通过一次处理每个文件,您的代码可以大大简化,如下所示:

use strict;
use warnings;
use autodie;

my ($id_file, $info_file) = @ARGV;

my %info;
open my $fh, '<', $info_file; # \"a12 AAA BBB\na12 ppp yyy\nb13 CCC DDD\nb13 GGG SSS\nc14 HHH KKK\nc14 JJJ OOO\nd15 LLL LLL";
while (<$fh>) {
    chomp;
    my ($id) = split;
    push @{$info{$id}}, $_;
}

open $fh, '<', $id_file; # \"a12\nb13\nc14\nd15";
while (<$fh>) {
    chomp;
    print "$_:" . join(';', @{$info{$_}}) . "\n";
}

输出:

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

答案 2 :(得分:1)

这是一个使用perl的单行程序,可以从命令行运行:

perl -lne '
BEGIN {
    $x = pop; 
    push @{$h{$_->[0]}}, "@$_" for map [split], <>; 
    @ARGV = $x
}
print "$_:" . join ";" , @{ $h{$_} }' Information.txt Uniq_ID.txt

输出:

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

答案 3 :(得分:0)

问题是,在$i$name1放入该索引后,您需要增加$arrayname,然后再尝试在$i再次访问该元素,现在已经过去它。存储$i后增加$info,或改为使用push

while($name1=<FP1>)
{
  chomp($name1);
  #collect the name according to the Uniq_ID                                                           
  $arrayname[$i]=$name1; # <-- You insert into the array at $i here 
  $i++;                  # <-- You increment $i here
  open(FP2, $file2);
  while($info=<FP2>)
  {
    chomp($info);
    @aa=split(/\s/,$info);
    $name2=$aa[0];
    $seq=$aa[1];
    #if name in Uniq_ID is same with name in information.txt                                                    
    if($name1 =~ /^$name2$/)
    {                                                          
        #hash of arrays"                                                                          
        #put each line of information.txt into a Uniq_ID                                                     
        push @{$data{$arrayname[$i]}}, $info; # <-- You access the element at $i here
      }
  }
  # <-- You should increment $i here (but use push instead)
}