我有一个像这样的空格分隔文件:
First Second Third Forth
It is possible to
do this task
with regex but i
don't know how to
我的任务是捕获每一行的所有单词并从中构造一个哈希值。
但这是我的问题:任何列中的字段可能为空(例如第3行,第3个字段)。
每行中的单词在列的开头或结尾对齐。 (列的名称是第一行中的单词,例如First Second Third Forth
)
在我的示例中,单词在First Third Forth
列中与左侧(或列名称的开头)对齐,并在Second
使用每行的哈希我必须创建如下格式的输出:
$hash{First} has Second-property $hash{Second}. It also has $hash{Third} and $hash{Forth}.
use File::Basename;
use locale;
open my $file, "<", $ARGV[0];
open my $file2,">>",fileparse($ARGV[0])."2.txt";
my @alls = <$file>;
sub Main{
my $first = shift @alls;
my $poses = First_And_Last($first);
my $curr_poses;
my $curr_hash;
#do{OutputLine($_->[0],$_->[1],$first)}for (@$poses);
my $result_array=[];
my @keys = qw(# Variable Type Len Format Informat Label);
for $word(@alls){
$curr_poses=First_And_Last($word);
undef ($curr_hash);
$curr_hash = Take_Words($poses, $word, $curr_poses);
push @{$result_array},$curr_hash; #AoH
}
#end of main
}
sub First_And_Last{
#First_And_Last($str)
my $str = shift;
my $begin;
my $end;
my $ref=[];
while ($str=~m/(([\S\.]\s?)+\b|#)/g){
$begin = pos($str) - length($1);
$end = pos($str);
push @{$ref},[$begin,$end];
}
return $ref;
}
sub Take_Words{
#Take_Words($poses, $line,$current)
my $outref = {};
my $ref = shift; #take the ref of offsets of words
my $line = shift;# and the next line in file
my $current = shift; # and this is the poses of current line
my @keys = qw(# Variable Type Len Format Informat Label);
do{$outref->{$_}=undef;}for(@keys);
my $ethalon; #for $ref
my $relativity; #for $current
my $key; #for key in $outref
my @ethalon = @{$ref};
$ethalon = shift @ethalon;
$relativity = shift @{$current};
$key = shift @keys;
while (defined($key) && defined($relativity)){
if ($ethalon->[0] == $relativity->[0] || $ethalon->[1] == $relativity->[1]){
$outref->{$key} = substr($line, $relativity->[0],$relativity->[1] - $relativity->[0]);
$relativity = shift @{$current};
}
$ethalon = shift @ethalon;
$key = shift @keys;
}
return $outref;
}
答案 0 :(得分:2)
这是我的算法,但它有点像C-ish:
确定每个列标题的起始位置并存储它。
对于每一栏:转到标题起始位置。
向左走,直到你连续两个空格。
右转两个字符,然后记住位置。
向右走直到你连续两个空格。
左转两个字符,然后记住这个位置。
在找到的边界之间提取所有内容。
删除起始和尾随空格。
存储在哈希
从第2步开始重复
现在我们必须看看这个实现:
第1步:
my @starting;
{
my @char = split m{}, <$file>; # split the first line into char array
my $spacecount = 0;
my $state = 1; # 1 : find start -- 0 : find end
for (my $i = 0; $i < @char; $i++) {
if ($state) { # find next non-space
if ($char[$i] =~ /\s/) {
next;
} else {
$state = not $state; # flip
$spacecount = 0;
push @starting, $i;
next;
}
} else {
if ($char[$i] =~ /\s/) {
$spacecount++;
if ($spacecount >= 2) {
$state = not $state; # flip
next;
}
} else {
$spacecount = 0; # reset consecutive space counter
next;
}
}
}
}