Perl - 从数组中提取具有偏移量的数字序列

时间:2014-06-26 13:27:41

标签: arrays perl extract sequence

我正在尝试在整数数组中搜索一系列数字。例如,如果数组由数字1,2,3,10,12,14组成,则可以将其汇总为

1到3,偏移量为1,

10到14,偏移量为2

在我的代码下面,我从第二个元素遍历数组,跟踪连续数组元素之间的偏移量,如果偏移量发生变化,则创建一个新的“系列”:

use strict;
use warnings;

my @numbers = (1,2,3,10,12,14); #array to extract series from 
my $last_diff;
my $start = $numbers[0];
my $end;
my @all_series; #array will hold all information on series
for my $i (1..($#numbers+1)){
        my $diff;
        if ($i <($#numbers+1)){
                $diff = $numbers[$i] - $numbers[$i-1];
        }
        if (!$diff || ( $last_diff && ($last_diff != $diff)) ) {
                $end = $numbers[$i-1];
                my $series = { 'start'=> $start,
                            'end'  => $end,
                            'offset'=> $start == $end ? 1 : $last_diff,
                };
                push @all_series, $series;
                $start = $numbers[$i];
        }
        $last_diff = $diff;
}

use Data::Dumper;
print Dumper(@all_series);

输出如下:

$VAR1 = {
          'offset' => 1,
          'end' => 3,
          'start' => 1
        };
$VAR2 = {
          'offset' => 1,
          'end' => 10,
          'start' => 10
        };
$VAR3 = {
          'offset' => 2,
          'end' => 14,
          'start' => 12
        };

这不是理想的结果,因为最后两个系列可以归纳为一个(10到14,偏移2而不是两个系列)。

算法中的缺陷与perl无关,但是,也许有人可以给我一个如何最好地处理这个问题的提示,也许存在一些perl特定的技巧。

在我的应用程序中,数组中的所有整数都按升序排列,并且不存在重复的数字。

修改 如果单个数字出现不能认真对待,则它们应该是一系列长度的数字。

可以将更多数字汇总为系列,越好(我想最小化系列数!)

2 个答案:

答案 0 :(得分:3)

问题在于三元运算符。如果你使用普通

offset => $last_diff,
你注意到有

$VAR2 = {
          'offset' => 7,
          'end' => 10,
          'start' => 10

在某种程度上哪个是正确的。为避免这种情况,您可以在推送到@series后undef $diff。它会为你的情况产生预期的输出,但是仍然会将1 2 3 7 10 12 14视为三个序列,从1,7和12开始。你需要的是现在以某种方式使长句贪婪。

我尝试了以下内容,但您应该测试更多:

#!/usr/bin/perl
use warnings;
use strict;

use Data::Dumper;

my @numbers = (1, 2, 3, 10, 12, 14);
my $last_diff;
my $start = $numbers[0];
my @all_series;
for my $i (1 .. $#numbers + 1) {
    my $diff;
    if ($i < $#numbers + 1) {
        $diff = $numbers[$i] - $numbers[ $i - 1 ];
    }

    # Merge with the last number from the previous series if needed:
    if (!$last_diff # Just starting a new series.
        and $i > 2  # Far enough to have preceding numbers.
        and $diff and $diff == $numbers[ $i - 1 ] - $numbers[ $i - 2 ]
       ) {
        $all_series[-1]{end} = $numbers[ $i - 3 ];
        $all_series[-1]{offset} = 0 if $all_series[-1]{start} == $all_series[-1]{end};
        $start = $numbers[ $i - 2 ];
    }

    if (! $diff or ( $last_diff && ($last_diff != $diff)) ) {
        push @all_series, { start  => $start,
                            end    => $numbers[ $i - 1 ],
                            offset => $last_diff,
                          };
        $start = $numbers[$i];
        undef $diff;
    }
    $last_diff = $diff;
}

print Dumper(@all_series);

答案 1 :(得分:3)

如果在三个单独的步骤中完成,这是最容易解决的

  • 确定每个号码之间的差异。
  • 确定差异序列。
  • 最后确定范围。

完成上述每个步骤,以便更轻松地调试每个步骤是否正确。此外,对于1,7,8,9等特定值,有必要提前查看三个数字,以确定7是否应与1同步。因此,提前计算所有信息有助于更容易地确定和指定最终循环中构建范围所需的规则。

为了使输出更容易阅读,我将单独的数字显示为start值。另外,我在范围哈希中添加了count。稍后可以轻松调整这些更改。

对于其他测试数据,我添加了一个带有孤立1的序列,后面跟着一个3个数字的序列,我还为挑战添加了一个Fibonacci序列。

use strict;
use warnings;

use Data::Dump;

while (<DATA>) {
    chomp;
    my @nums = split ',';

    my @diffs = map {$nums[$_+1] - $nums[$_]} (0..$#nums-1);

    my @seq;
    for (@diffs) {
        if (@seq && $seq[-1]{diff} == $_) {
            $seq[-1]{count}++;
        } else {
            push @seq, {diff => $_, count => 1};
        }
    }

    my @ranges;
    for (my $i = 0; $i < @nums; $i++) {
        my $seq = shift @seq;

        # Solitary Number
        if (!$seq || ($seq->{count} == 1 && @seq && $seq[0]{count} > 1)) {
            push @ranges, {start => $nums[$i]};

        # Confirmed Range
        } else {
            push @ranges, {
                start  => $nums[$i],
                end    => $nums[$i + $seq->{count}],
                count  => $seq->{count} + 1,    # Can be commented out
                offset => $seq->{diff},
            };
            $i += $seq->{count};
            shift @seq if @seq && !--$seq[0]{count};
        }
    }

    dd @nums;
    dd @ranges;
    print "\n";
}

__DATA__
1,2,3,10,12,14
1,2,3,5,7
1,7,8,9
1,2,3,7,8,11,13,15,22,100,150,200
2,3,5,8,13,21,34,55,89

输出:

(1, 2, 3, 10, 12, 14)
(
  { count => 3, end => 3, offset => 1, start => 1 },
  { count => 3, end => 14, offset => 2, start => 10 },
)

(1, 2, 3, 5, 7)
(
  { count => 3, end => 3, offset => 1, start => 1 },
  { count => 2, end => 7, offset => 2, start => 5 },
)

(1, 7, 8, 9)
(
  { start => 1 },
  { count => 3, end => 9, offset => 1, start => 7 },
)

(1, 2, 3, 7, 8, 11, 13, 15, 22, 100, 150, 200)
(
  { count => 3, end => 3, offset => 1, start => 1 },
  { count => 2, end => 8, offset => 1, start => 7 },
  { count => 3, end => 15, offset => 2, start => 11 },
  { start => 22 },
  { count => 3, end => 200, offset => 50, start => 100 },
)

(2, 3, 5, 8, 13, 21, 34, 55, 89)
(
  { count => 2, end => 3, offset => 1, start => 2 },
  { count => 2, end => 8, offset => 3, start => 5 },
  { count => 2, end => 21, offset => 8, start => 13 },
  { count => 2, end => 55, offset => 21, start => 34 },
  { start => 89 },
)