Question

我在Perl中将文件读入哈希时遇到了一些问题。

Chr1_supercontig_000000000  1   500
    PILOT21_588_1_3_14602_59349_1
Chr1_supercontig_000000001  5   100
    PILOT21_588_1_21_7318_90709_1
    PILOT21_588_1_43_18803_144592_1
    PILOT21_588_1_67_13829_193943_1
    PILOT21_588_1_42_19678_132419_1
    PILOT21_588_1_67_4757_125247_1
...

所以我上面有这个文件。我想要的输出是一个散列，其中＆＃34; Chr1＆＃34; -lines为键，＆＃34; PILOT＆＃34; -lines为值。

Chr1_supercontig_000000000 => PILOT21_588_1_3_14602_59349_1
Chr1_supercontig_000000001 => PILOT21_588_1_21_7318_90709_1, PILOT21_588_1_43_18803_144592_1,...

据我所知，只能通过引用将多个值分配给一个键，这是正确的吗？

我在这一点上陷入困境并需要帮助。

Answer 1

你是对的，哈希值需要是指向包含PILOT行的数组的引用。

这是一种方法：

my %hash;
open FILE, "filename.txt" or die $!;
my $key;
while (my $line = <FILE>) {
     chomp($line);
     if ($line !~ /^\s/) {
        ($key) = $line =~ /^\S+/g;
        $hash{$key} = [];
     } else {
        $line =~ s/^\s+//;
        push @{ $hash{$key} }, $line;
     }
 }
 close FILE;

Answer 2

您可以逐行读取文件，跟踪当前的哈希键：

open my $fh, '<', 'file' or die $!;

my (%hash, $current_key);

while (<$fh>) {
    chomp;        
    $current_key = $1, next if /^(\S+)/;
    s/^\s+//; # remove leading space
    push @{ $hash{$current_key} }, $_;
}

Answer 3

怎么样：

#!/usr/bin/perl 
use strict;
use warnings;
use Data::Dump qw(dump);

my %hash;
my $key;
while(<DATA>) {
    chomp;
    if (/^(Chr1_supercontig_\d+)/) {
        $key = $1;
        $hash{$key} = ();
    } else {
        push @{$hash{$key}}, $_;
    }
}
dump%hash;

__DATA__
Chr1_supercontig_000000000  1   500
    PILOT21_588_1_3_14602_59349_1
Chr1_supercontig_000000001  5   100
    PILOT21_588_1_21_7318_90709_1
    PILOT21_588_1_43_18803_144592_1
    PILOT21_588_1_67_13829_193943_1
    PILOT21_588_1_42_19678_132419_1
    PILOT21_588_1_67_4757_125247_1

<强>输出：

(
  "Chr1_supercontig_000000001",
  [
    "    PILOT21_588_1_21_7318_90709_1",
    "    PILOT21_588_1_43_18803_144592_1",
    "    PILOT21_588_1_67_13829_193943_1",
    "    PILOT21_588_1_42_19678_132419_1",
    "    PILOT21_588_1_67_4757_125247_1",
  ],
  "Chr1_supercontig_000000000",
  ["    PILOT21_588_1_3_14602_59349_1"],
)

Answer 4

许多好的答案已经存在，所以我将添加一个不依赖于正则表达式的答案，而是根据关键字线包含三个空格/制表符分隔的条目，并且值只有一个。

它会自动删除前导空格和换行符，因此有点方便。

use strict;
use warnings;

my %hash;
my $key;

while (<DATA>) {
    my @row = split;
    if (@row > 1) {
        $key = shift @row;
    } else {
        push @{$hash{$key}}, shift @row;
    }
}

use Data::Dumper;
print Dumper \%hash;

__DATA__
Chr1_supercontig_000000000  1   500
    PILOT21_588_1_3_14602_59349_1
Chr1_supercontig_000000001  5   100
    PILOT21_588_1_21_7318_90709_1
    PILOT21_588_1_43_18803_144592_1
    PILOT21_588_1_67_13829_193943_1
    PILOT21_588_1_42_19678_132419_1
    PILOT21_588_1_67_4757_125247_1

Answer 5

这是另一个相当简短的版本：

while (<>) {
   if(/^Chr\S+/) {
      $c=$&;
   } else {
      /\S+/;
      push @{ $p{$c} }, $&;
   }
}

打印结果：

foreach my $pc ( sort keys %p ) {
   print "$pc => ".join(", ", @{$p{$pc}})."\n";
}

这是一个较短的打印结果（但第一个看起来对我来说更具可读性）：

map { print "$_ => ".join(", ", @{$p{$_}})."\n" } sort keys %p;

命令行中的单行：

perl <1 -e 'while(<>){ if(/^Chr\S+/){ $c=$&; }else{ /\S+/; push(@{$p{$c}},$&);} } map { print "$_ => ".join(", ", @{$p{$_}})."\n" } sort keys %p;'

Answer 6

试试这个，

#!/usr/bin/perl 
use strict;
use warnings;
use Data::Dumper;

my ( $fh,$cur );
my $hash = ();
open $fh,'<' , 'file' or die "Can not open file\n";

while (<$fh> ) {
    chomp;
    if ( /^(Chr.+? ).+/ ) {
        $cur = $1;
        $hash->{$cur} = '';
    }
    else {
        $hash->{$cur} = $hash->{$cur} .$_ . ',';
    }
}

print Dumper $ hash;

将整个文件读入Perl中的哈希

6 个答案: