Question

这是一个简单的问题，但无法找到任何有效的解决方案。我有2个文件，第一个文件包含我感兴趣的所有ID，例如“番茄”，“黄瓜”，还有我不感兴趣的那些，在第二个文件中没有任何值。第二个文件具有以下数据结构

tomato    red

tomato    round

tomato    sweet

cucumber    green

cucumber    bitter

cucumber    watery

我需要得到的是一个包含所有ID的文件，其中包含来自第二个文件的所有匹配值，所有标签都是分开的，如下所示：

tomato    red    round    sweet

cucumber    green    bitter    watery

到目前为止，我在第一个文件中创建了一个哈希：

 while (<FILE>) {  
     chomp;  
     @records = split "\t", $_; 
     {%hash = map { $records[0] => 1 } @records};
 }

这是第二个文件：

  while (<FILE2>) {
      chomp;
      @records2 = split "\t", $_; 
      $key, $value = $records2[0], $records2[1];
      $data{$key} = join("\t", $value);
  }

 close FILE;

 foreach my $key ( keys %data )
 {
     print OUT "$key\t$data{$key}\n"
     if exists $hash{$key} 
 }

感谢一些简单的解决方案，用于组合匹配相同ID的所有值！：）

Answer 1

表示第一个文件：

while (<FILE>) {  
    chomp;  
    @records = split "\t", $_; 
    $hash{$records[0]} = 1;
}

和第二个：

while (<FILE2>) {
    chomp;
    @records2 = split "\t", $_;
    ($key,$value) = @records2;
    $data{$key} = [] unless exists $data{$key};
    push @{$data{$key}}, $value;
}
close FILE;

foreach my $key ( keys %data ) {
    print OUT $key."\t".join("\t", @{$data{$key}})."\n" if exists $hash{$key};
}

Answer 2

这似乎做了所需的事情

use strict;
use warnings;

my %data;

open my $fh, '<', 'file1.txt' or die $!;
while (<$fh>) {
  $data{$1} = {} if /([^\t]+)/;
}

open $fh, '<', 'file2.txt' or die $!;
while (<$fh>) {
  $data{$1}{$2}++ if /^(.+?)\t(.+?)$/ and exists $data{$1};
}

while ( my ($key, $values) = each %data) {
  print join("\t", $key, keys %$values), "\n";
}

<强>输出

tomato  sweet round red
cucumber  green watery  bitter

Answer 3

如果您首先阅读数据映射，则会更容易。

此外，如果您使用Perl，您应该从一开始就考虑利用其主要优势--CPAN库。例如，文件的读入就像read_file()中的File::Slurp一样简单;而不是必须自己打开/关闭文件，然后运行while（＆lt;＆gt;）循环。

use File::Slurp;
my %data;

my @data_lines = File::Slurp::read_file($filename2);
chomp(@data_lines);
foreach my $line (@data_lines) { # Improved version from CyberDem0n's answer
    my ($key, $value) = split("\t", $line);
    $data{$key} ||= []; # Make sure it's an array reference if first time
    push @{ $data{$key} }, $value;
}

my @id_lines = File::Slurp::read_file($filename1);
chomp(@id_lines);
foreach my $id (@id_lines) {
    print join("\t", ( $id, @{ $data{$id} } ) )."\n";
}

稍微多一些hacky但是更短的代码会将ID添加到get go的数据哈希中的值列表中：

my @data_lines = File::Slurp::read_file($filename2);
chomp(@data_lines);
foreach my $line (@data_lines) { # Improved version from CyberDem0n's answer
    my ($key, $value) = split("\t", $line);
    $data{$key} ||= [ $id ]; # Add the ID for printing
    push @{ $data{$key} }, $value;
}

my @id_lines = File::Slurp::read_file($filename1);
chomp(@id_lines);
foreach my $id (@id_lines) {
    print join("\t", @{ $data{$id} } ) ."\n"; # ID already in %data!
}

使用perl检索与相同ID匹配的值

3 个答案: