Perl - 将文件文本解析为哈希

时间:2013-03-18 12:11:11

标签: perl parsing hash

我想解析文件文本然后将其放入哈希值。我的文件看起来像是:

key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val

我的键在空格之前,我的值是空格之后和每个逗号之前的元素列表。我有一些没有键的行,因为值继续在几行上。

所以我想要这样的哈希(我在Python中最熟悉):

hash={'key1':[val,val,...],'key2':[val,val,...]} 

我的代码: `

my %hashNames;
open INFILE, "./file.txt" or die $!;
my @temp = ();

while (my $line = <INFILE>)
{

    my @names = split /[\t,]/, $line;
    my $ID = $names[0];
    if ( $line =~ /\t/ )
    {

        my @temp=();
        for (my $i = 1; $i < @names; $i +=1)
        {
            push (@temp, $names[$i]);
        }

    }
    else
    {   

        for (my $i = 0; $i < @names; $i +=1)
        {
            push (@temp, $names[$i]);
        }       
    }
}`

5 个答案:

答案 0 :(得分:3)

您的问题是换行符不再将您的记录分开。因此,处理它的方法是禁用无效的默认输入记录分隔符$/并模拟有效的分隔符:

use strict;
use warnings;
use Data::Dumper;

my %hash;
my $file;
{
    local $/;         # disable input record separator
    $file = <DATA>;   # entire file here now!
}

for my $line (split /^(?=\S+ )/m, $file) {  # records begin this way now
    $line =~ s/\n//g;                       # remove newlines
    my ($key, $val) = split ' ', $line, 2;  # divide into two fields
    $hash{$key} = [ split /,/, $val ];      # store the data
}

print Dumper \%hash;

__DATA__
key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val

<强>说明:

  • 使用/^(?=\S+ )/m修饰符分割/m意味着^现在将匹配字符串中的换行符,这将模拟输入记录分隔符。
  • 在两个字段中拆分字符串是通过向split
  • 添加LIMIT 2来完成的
  • 我们通过使用匿名数组[ ... ]并在其中包含split语句直接拆分为哈希。

答案 1 :(得分:2)

使用Parse::RecDescent模块

#! /usr/bin/env perl

use strict;
use warnings;

use Parse::RecDescent;

our %hash;
my $p = Parse::RecDescent->new(q!
  hash: entry(s?)
  entry: key value(s /,/)  { $::hash{$item[1]} = [ @{ $item[2] } ] }
  key: /\S+/
  value: /([^,\n]|\\,])+/
!);
die "$0: failed to create parser" unless defined $p;

my $text = do {{ local $/; <DATA> }};
$p->hash($text) or die "$0: parse failed";

for (sort keys %hash) {
  print "$_ => val x ", scalar @{ $hash{$_} }, "\n";
}

__DATA__
key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val

输出:

key1 => val x 22
key2 => val x 22
key3 => val x 1
key4 => val x 2
key5 => val x 52

答案 2 :(得分:1)

这里的困难在于您的记录以“不带逗号的换行符”终止。不幸的是,输入记录分隔符$/无法设置为正则表达式。这留下了三个舒适的解决方案:

  1. 将整个文件加载到内存中。这并不像听起来那么糟糕,因为我们稍后在哈希中有相同数量的信息。然后我们可以split /(?<!,)\n/获取实际记录。

    my %hash = do {
      local $/; # set to undef, for slurp
      map {
        my ($key, $vals) = split /\s+/, $_, 2; # split on first whitespace, into two strings
        $key => [ split /\s*,\s*/, $vals ];    # return a list of a key and a value array
      } split /(?<!,)\n/, <FILE>;              # split the file into records
    };
    
  2. 我们可以编写一个缓冲输入的readline替换,并可以使用正则表达式终止行。

  3. 我们可以将尾随逗号视为续行符。

    my %hash;
    while(<FILE>) {
      $_ .= <FILE> while /,\n\z/;
      my ($key, $value) = split /\s+/, $_, 2;
      push @{ $hash{$key} }, split /\s*,\s*/, $value; # allow multiple occurrences of one key, simply append values to list.
    }
    

答案 3 :(得分:0)

这里你去:

my %results;
my $key;
while(my $line = <INFILE>) {
    chomp($line);
    my @items = split(/, */, $line);
    $key = shift @items;
    $results{$key} = \@items;
}

除了你的陈述之外,哪个适用于简单的案例:

我有一些没有键的行,因为值继续在几行上。

要处理这个问题,您必须解释如何检测下一行是键还是值。如果您知道,那么您可以将它放在if语句中并使用上一个键将新值添加到哈希:

my %results;
my $key;
while(my $line = <INFILE>) {
    chomp($line);
    my @items = split(/, */, $line);
    my $tmpkey = shift @items;
    if (is_real_key($tmpkey)) {
        $key = shift @items;
        $results{$key} = \@items;
    } else {
        push (@{$results{$key}}, $tmpkey, @items);
    }
}

答案 4 :(得分:0)

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my $res_hash = {};
my ($current_key, $values);
my $push_again;
while ( my $line = <DATA>) {
  chomp $line;
  push ( @{ $res_hash->{$current_key} }, split(/,/, $values) ) if ( $current_key and $values and ( index($line, ' ') > 0) );
  if ( index($line, ' ') > 0 ){
    $push_again = 0;
    ($current_key, $values) = split( /\s/, $line);    
  } else {
    $values .= $line;
    $push_again = 1;
  }

};
push ( @{ $res_hash->{$current_key} }, split(/,/, $values) ) if $push_again;

say "result:".Dumper($res_hash);



__DATA__
key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val