在Perl脚本中创建哈希哈希的步骤

时间:2016-03-09 05:42:41

标签: regex perl file hash

创建下面哈希哈希的方法是我的文件的内容,当我用下面的内容打开我的输入文件时,我们想要从表中grep一些数据,这样我想创建一个下面的模式哈希。

{
      'Golden' => {
                      'PT' => 80,
                      'po' => 43,
                      'DFF' => 139145,
                      'DLAT' => 276,
                      'Z' => 4,
                      'BBOX' => 833,
                      'Total' => 140387
                    },
      'Foo Bar' => {
                      'PT' => 80,
                      'po' => 43,
                      'DFF' => 139145,
                      'DLAT' => 276,
                      'Z' => 4,
                      'BBOX' => 833,
                      'Total' => 140387
                   }
    };

文件内容

样本1

 Mapped points: SYSTEM class
 Mapped points     PI     PO     BBOX      Total   
 Golden            86     43     833       140387  
 Revised           86     43     833       140387  

样本2

 Mapped points: SYSTEM class
 Mapped points     PI     PO     DFF    DLAT   Z      BBOX      Total   
 Golden            86     43     139145 276    4      833       140387  
 Revised           86     43     139145 276    4      833       140387  

以及她被困的地方。

if ($line =~ m/^Mapped points: SYSTEM class$/){
    $var = "golden_mapped-points_PI";
    $hash{$var} = $6;
    next unless $line = <$filehandle> and $line =~ /^Mapped points     PI     PO     DFF    DLAT   Z      BBOX      Total   $/; 
}

并且在不同的输入文件编号列中可能会发生变化,例如在这种情况下它可能只有PI和PO我也没有得到如何处理输入文件。

3 个答案:

答案 0 :(得分:1)

您需要保留标题行的副本,因为标题是内部哈希条目的键。一旦开始处理数据行,您需要将第一个项目作为外部哈希条目的键移开 - 这是该行的一种标签。一旦将其存储起来,您的标题和数据就会排好&#34;;

#!/usr/bin/env perl
use v5.12 ;
use Data::Dumper ;

my $starting_re = 'Mapped points: SYSTEM class' ;
my $headings_re = 'Mapped \s+ points \s+' ;
my %outer_hash;

# Find the starting line
while (<>) { last if /$starting_re/ }

# Read in one line.  Expect it to be the headings line
# Split it into pieces and store them for latter
my @headings ;
my $headings_line = <> ;
if ($headings_line =~ / $headings_re (.*) /x) {
    @headings = split ' ', $1 ;
}
else {
    say "Dont understand the headings line.";
    exit 1;
}

# Process each remaining line
while (<>) {
    # Split the data line and shift off
    # the first piece to use as a line label
    my @data = split ' ', $_;
    my $label = shift @data ;

    # Initialize a new inner hash and an index variable
    my %inner_hash = ();
    my $index = 0;

    # Iterate over the data pieces building up the hash
    for (@data) {
        $inner_hash{ $headings[$index] }  =  $_ ;
        $index++;
    }

    # Alternatively, build it with a hash slice
    # @inner_hash{ @headings }  =  @data ;

    # Create an entry in the outer hash by
    # taking a reference to our inner hash
    $outer_hash{ $label } = \%inner_hash;
}

say Dumper( \%outer_hash );
exit 0;

答案 1 :(得分:0)

我想出了一个使用unpack的解决方案。它处理不同文件中的任意数量的列。

#!/usr/bin/perl
use strict;
use warnings;

my %data;

my ($len, @hdrs);
my (@lengths, $template);

while (<DATA>) {
    next if 1 .. /^Mapped points:/;

    s/\s+$//; # safe 'chomp' (if there are spaces at end of string)

    if (s/^(Mapped points\s+)//) {
        $len = length $1;
        @hdrs = split;

        # lengths of all but the last (Total) field
        @lengths = map {length} /(\S+\s+)/g;

        # add 'A*' at the end to capture the 'Total' field
        $template = join("", "A$len", map {'A'. $_} @lengths) . "A*";
    }
    else {

        my ($key, @rest) = unpack $template;

        my %temp;
        @temp{ @hdrs } = @rest;
        $data{$key} = \%temp;
    }   
}
use Data::Dumper; print Dumper \%data;

__DATA__
Mapped points: SYSTEM class
Mapped points     PI     PO     DFF    DLAT   Z      BBOX      Total
Golden            86     43     139145 276    4      833       140387
Revised           86     43     139145 276    4      833       140387

输出是:

$VAR1 = {
          'Golden' => {
                        'Total' => '140387',
                        'BBOX' => '833',
                        'Z' => '4',
                        'PO' => '43',
                        'DLAT' => '276',
                        'DFF' => '139145',
                        'PI' => '86'
                      },
          'Revised' => {
                         'Total' => '140387',
                         'BBOX' => '833',
                         'Z' => '4',
                         'PO' => '43',
                         'DLAT' => '276',
                         'DFF' => '139145',
                         'PI' => '86'
                       }
        };

答案 2 :(得分:-1)

我想过来,找到了解决方案!!我知道它还不是最好的解决方案,我请你为我提供最好的解决方案

cat > test.pl
#!/usr/bin/perl  

use strict;
use warnings;

while(<DATA>){
my $line = $_;
if ($line =~ m/^Mapped points: SYSTEM class$/){
    next unless $line = <DATA> and $line =~ /^Mapped points/; 
    my @arr = split(' ',$line);

    next unless $line = <DATA>;

    my @golden_arr = split(' ',$line);
    my $size1 = @golden_arr;
    for(my $i = 1;$i < $size1 ;$i++){
        my $var = "Golden_".$arr[$i+1];
        print "$var, $golden_arr[$i]\n";
    }
    next unless $line = <DATA>;
    my @revised_arr = split(' ',$line);
    my $size2 = @revised_arr;
    for(my $i = 1;$i < $size2 ;$i++){
        my $var = "Revised_".$arr[$i+1];
        print "$var, $revised_arr[$i]\n";
    }   
}
}


__DATA__

Mapped points: SYSTEM class
Mapped points     PI     PO     DFF    DLAT   Z      BBOX      Total   
Golden            86     43     139145 276    4      833       140387  
Revised           86     43     139145 276    4      833       140387  

perl test.pl
Golden_PI, 86
Golden_PO, 43
Golden_DFF, 139145
Golden_DLAT, 276
Golden_Z, 4
Golden_BBOX, 833
Golden_Total, 140387
Revised_PI, 86
Revised_PO, 43
Revised_DFF, 139145
Revised_DLAT, 276
Revised_Z, 4
Revised_BBOX, 833
Revised_Total, 140387

有更好的解决方案吗?