我想解析文件文本然后将其放入哈希值。我的文件看起来像是:
key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val
我的键在空格之前,我的值是空格之后和每个逗号之前的元素列表。我有一些没有键的行,因为值继续在几行上。
所以我想要这样的哈希(我在Python中最熟悉):
hash={'key1':[val,val,...],'key2':[val,val,...]}
我的代码: `
my %hashNames;
open INFILE, "./file.txt" or die $!;
my @temp = ();
while (my $line = <INFILE>)
{
my @names = split /[\t,]/, $line;
my $ID = $names[0];
if ( $line =~ /\t/ )
{
my @temp=();
for (my $i = 1; $i < @names; $i +=1)
{
push (@temp, $names[$i]);
}
}
else
{
for (my $i = 0; $i < @names; $i +=1)
{
push (@temp, $names[$i]);
}
}
}`
答案 0 :(得分:3)
您的问题是换行符不再将您的记录分开。因此,处理它的方法是禁用无效的默认输入记录分隔符$/
并模拟有效的分隔符:
use strict;
use warnings;
use Data::Dumper;
my %hash;
my $file;
{
local $/; # disable input record separator
$file = <DATA>; # entire file here now!
}
for my $line (split /^(?=\S+ )/m, $file) { # records begin this way now
$line =~ s/\n//g; # remove newlines
my ($key, $val) = split ' ', $line, 2; # divide into two fields
$hash{$key} = [ split /,/, $val ]; # store the data
}
print Dumper \%hash;
__DATA__
key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val
<强>说明:强>
/^(?=\S+ )/m
修饰符分割/m
意味着^
现在将匹配字符串中的换行符,这将模拟输入记录分隔符。split
[ ... ]
并在其中包含split语句直接拆分为哈希。答案 1 :(得分:2)
#! /usr/bin/env perl
use strict;
use warnings;
use Parse::RecDescent;
our %hash;
my $p = Parse::RecDescent->new(q!
hash: entry(s?)
entry: key value(s /,/) { $::hash{$item[1]} = [ @{ $item[2] } ] }
key: /\S+/
value: /([^,\n]|\\,])+/
!);
die "$0: failed to create parser" unless defined $p;
my $text = do {{ local $/; <DATA> }};
$p->hash($text) or die "$0: parse failed";
for (sort keys %hash) {
print "$_ => val x ", scalar @{ $hash{$_} }, "\n";
}
__DATA__
key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val
输出:
key1 => val x 22 key2 => val x 22 key3 => val x 1 key4 => val x 2 key5 => val x 52
答案 2 :(得分:1)
这里的困难在于您的记录以“不带逗号的换行符”终止。不幸的是,输入记录分隔符$/
无法设置为正则表达式。这留下了三个舒适的解决方案:
将整个文件加载到内存中。这并不像听起来那么糟糕,因为我们稍后在哈希中有相同数量的信息。然后我们可以split /(?<!,)\n/
获取实际记录。
my %hash = do {
local $/; # set to undef, for slurp
map {
my ($key, $vals) = split /\s+/, $_, 2; # split on first whitespace, into two strings
$key => [ split /\s*,\s*/, $vals ]; # return a list of a key and a value array
} split /(?<!,)\n/, <FILE>; # split the file into records
};
我们可以编写一个缓冲输入的readline
替换,并可以使用正则表达式终止行。
我们可以将尾随逗号视为续行符。
my %hash;
while(<FILE>) {
$_ .= <FILE> while /,\n\z/;
my ($key, $value) = split /\s+/, $_, 2;
push @{ $hash{$key} }, split /\s*,\s*/, $value; # allow multiple occurrences of one key, simply append values to list.
}
答案 3 :(得分:0)
这里你去:
my %results;
my $key;
while(my $line = <INFILE>) {
chomp($line);
my @items = split(/, */, $line);
$key = shift @items;
$results{$key} = \@items;
}
除了你的陈述之外,哪个适用于简单的案例:
我有一些没有键的行,因为值继续在几行上。
要处理这个问题,您必须解释如何检测下一行是键还是值。如果您知道,那么您可以将它放在if语句中并使用上一个键将新值添加到哈希:
my %results;
my $key;
while(my $line = <INFILE>) {
chomp($line);
my @items = split(/, */, $line);
my $tmpkey = shift @items;
if (is_real_key($tmpkey)) {
$key = shift @items;
$results{$key} = \@items;
} else {
push (@{$results{$key}}, $tmpkey, @items);
}
}
答案 4 :(得分:0)
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $res_hash = {};
my ($current_key, $values);
my $push_again;
while ( my $line = <DATA>) {
chomp $line;
push ( @{ $res_hash->{$current_key} }, split(/,/, $values) ) if ( $current_key and $values and ( index($line, ' ') > 0) );
if ( index($line, ' ') > 0 ){
$push_again = 0;
($current_key, $values) = split( /\s/, $line);
} else {
$values .= $line;
$push_again = 1;
}
};
push ( @{ $res_hash->{$current_key} }, split(/,/, $values) ) if $push_again;
say "result:".Dumper($res_hash);
__DATA__
key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val