我有一个包含时间和一些数据的文件。我需要按时间排序数组,但有两个障碍:如果时间是23:59:59:999或Null, 它需要保持与其他一切相关的地方。我已经把文件放在一个数组中。
例如:
data1 10:25:34.001 data2
data1 10:25:34.002 data2
data1 10:25:34.005 data2
data1 Null data2
data1 Null data2
data1 Null data2
data1 10:25:34.006 data2
data1 10:25:34.007 data2
data1 10:25:34.008 data2
data1 23:59:59:999 data2
data1 23:59:59:999 data2
data1 10:25:34.003 data2
data1 10:25:34.004 data2
data1 10:25:34.010 data2
data1 10:25:34.011 data2
应该成为:
data1 10:25:34.001 data2
data1 10:25:34.002 data2
data1 10:25:34.003 data2
data1 10:25:34.004 data2
data1 10:25:34.005 data2
data1 Null data2
data1 Null data2
data1 Null data2
data1 10:25:34.006 data2
data1 10:25:34.007 data2
data1 10:25:34.008 data2
data1 23:59:59:999 data2
data1 23:59:59:999 data2
data1 10:25:34.010 data2
data1 10:25:34.011 data2
我目前所做的是将它们分成单独的块#34; (时间,空时等)所以当我转储数据数组时,它看起来像:
(
[
[
"data1",
"10:25:34.001",
"data2",
],
[
"data1",
"10:25:34.002",
"data2",
],
[
"data1",
"10:25:34.005",
"data2",
],
],
[
[
"data1",
"Null",
"data2",
],
[
"data1",
"Null",
"data2",
],
[
"data1",
"Null",
"data2",
],
],
[
[
"data1",
"10:25:34.006",
"data2",
],
[
"data1",
"10:25:34.007",
"data2",
],
[
"data1",
"10:25:34.008",
"data2",
],
],
[
[
"data1",
"23:59:59:999",
"data2",
],
[
"data1",
"23:59:59:999",
"data2",
],
],
[
[
"data1",
"10:25:34.003",
"data2",
],
[
"data1",
"10:25:34.004",
"data2",
],
[
"data1",
"10:25:34.010",
"data2",
],
[
"data1",
"10:25:34.011",
"data2",
],
]
)
我的想法是找到"第一个"的最大值。阻止并忽略" Null"和" 23:59:59.999"阻止,并推送剩余块中任何大于"第一个"的最大值的值。块。
我难以在单独的区块之间移动时间然后对其进行排序,并且想知道是否有人对如何执行此操作有任何建议(或者是否有更好的方法 从我到目前为止的结构中分类出来了吗?
答案 0 :(得分:3)
要实现您的目标,您无法对原始数据结构进行排序。您需要对其进行转换,以便将要保留其相对位置的行附加到前面的元素。
这里的更新版本比我发布的第一个实现要好一点(如果好奇,请参阅编辑历史记录):
#!/usr/bin/env perl
use strict;
use warnings;
my @data = [ [ split ' ', scalar <DATA> ] ];
while (my $row = <DATA>) {
next unless $row =~ /\S/;
my @x = split ' ', $row;
if (($x[1] eq 'Null') or ($x[1] eq '23:59:59:999')) {
push @{ $data[-1] }, \@x;
next;
}
push @data, [ \@x ];
}
@data = map @$_, sort { $a->[0][1] cmp $b->[0][1] } @data;
print "@$_\n" for @data;
__DATA__
data1 10:25:34.001 data2
data1 10:25:34.002 data2
data1 10:25:34.005 data2
data1 Null data2
data1 Null data2
data1 Null data2
data1 10:25:34.006 data2
data1 10:25:34.007 data2
data1 10:25:34.008 data2
data1 23:59:59:999 data2
data1 23:59:59:999 data2
data1 10:25:34.003 data2
data1 10:25:34.004 data2
data1 10:25:34.010 data2
data1 10:25:34.011 data2
输出:
data1 10:25:34.001 data2 data1 10:25:34.002 data2 data1 10:25:34.003 data2 data1 10:25:34.004 data2 data1 10:25:34.005 data2 data1 Null data2 data1 Null data2 data1 Null data2 data1 10:25:34.006 data2 data1 10:25:34.007 data2 data1 10:25:34.008 data2 data1 23:59:59:999 data2 data1 23:59:59:999 data2 data1 10:25:34.010 data2 data1 10:25:34.011 data2
这假设第一行数据不是Null
或午夜事件。如果数据的第一行可能包含Null
或午夜事件,则可以使用sentry元素:
#!/usr/bin/env perl
use strict;
use warnings;
my @data = [ [ undef, undef, undef ] ];
while (my $row = <DATA>) {
next unless $row =~ /\S/;
my @x = split ' ', $row;
if (($x[1] eq 'Null') or ($x[1] eq '23:59:59:999')) {
push @{ $data[-1] }, \@x;
next;
}
push @data, [ \@x ];
}
@data = map @$_, sort {
$a->[0][1] or return -1;
$b->[0][1] or return 1;
$a->[0][1] cmp $b->[0][1]
} @data;
shift @data;
print "@$_\n" for @data;
我们确保sentry元素在任何其他元素之前排序,然后在对数据做进一步的操作之前将其删除。
答案 1 :(得分:2)
好的,所以这里的问题是 - 当你对某些东西进行分类时,你可能真的有例外情况&#39; - 必须将每个元素与每个其他元素进行比较,并建立相对位置。
你可能做的最好的是一个自定义排序函数,它比较了值,如果遇到0
元素,则返回null
...但是这不一定有正确的结果,因为排序假设定位在逻辑上是一致的 - 而你的定位是不会的,因为你可能需要跳过&#39;跳过&#39;中间的一些块。
所以我会这样做:
这样的事情:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my @list_of_stuff = map { [split] } <DATA>;
my @list_to_sort =
grep { $_->[1] ne 'Null'
and $_->[1] ne '23:59:59:999' }
@list_of_stuff;
my @sorted = sort { $b->[1] cmp $a->[1] } @list_to_sort;
foreach my $row (@list_of_stuff) {
if ( $row->[1] eq "Null"
or $row->[1] eq '23:59:59:999' )
{
print join( "\t", @$row );
}
else {
print join( "\t", @{ pop @sorted } );
}
print "\n";
}
__DATA__
data1 10:25:34.001 data2
data1 10:25:34.002 data2
data1 10:25:34.005 data2
data1 Null data2
data1 Null data2
data1 Null data2
data1 10:25:34.006 data2
data1 10:25:34.007 data2
data1 10:25:34.008 data2
data1 23:59:59:999 data2
data1 23:59:59:999 data2
data1 10:25:34.003 data2
data1 10:25:34.004 data2
data1 10:25:34.010 data2
data1 10:25:34.011 data2
这会产生:
data1 10:25:34.001 data2
data1 10:25:34.002 data2
data1 10:25:34.003 data2
data1 Null data2
data1 Null data2
data1 Null data2
data1 10:25:34.004 data2
data1 10:25:34.005 data2
data1 10:25:34.006 data2
data1 23:59:59:999 data2
data1 23:59:59:999 data2
data1 10:25:34.007 data2
data1 10:25:34.008 data2
data1 10:25:34.010 data2
data1 10:25:34.011 data2
注意 - 对时间戳进行排序是字符串式的 - 在这种情况下有效,但我通常会建议转换为时间戳。鉴于“unix time&#39; unix time&#39; unix time&#39; unix time&#39;不支持毫秒,所以我把它留在了一边。
好的,现在进行第二次尝试我已经意识到你并没有那么多寻求例外,因为他们很难做到这一点。父值的值无效。
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $last;
my @list_to_sort;
#iterate input data
while (<DATA>) {
chomp;
#extract this row into an anonymous array.
my $stuff = [split];
#check if it's 'valid' in it's own right.
if ( $stuff->[1] eq 'Null'
or $stuff->[1] eq '23:59:59:999' )
{
#if it isn't, 'tag on' to the last element we've seen.
#NB - will error if there's no valid 'first' row.
push( @$last, $stuff );
}
else {
#insert into 'list_to_sort
my $thing = [ $stuff->[1], $stuff ];
push( @list_to_sort, $thing );
#make a note of the last array ref to insert next stuff into.
$last = $thing;
}
}
print Dumper \@list_to_sort;
my @sorted = sort { $a->[0] cmp $b->[0] } @list_to_sort;
print Dumper \@sorted;
foreach my $blob (@sorted) {
foreach my $line (@$blob) {
#because we're 'keying' this element,
#we don't actually need to print the 'key'.
next unless ref $line;
print join( "\t", @$line ), "\n";
}
}
__DATA__
data1 10:25:34.001 data2
data1 10:25:34.002 data2
data1 10:25:34.005 data2
data1 Null data2
data1 Null data2
data1 Null data2
data1 10:25:34.006 data2
data1 10:25:34.007 data2
data1 10:25:34.008 data2
data1 23:59:59:999 data2
data1 23:59:59:999 data2
data1 10:25:34.003 data2
data1 10:25:34.004 data2
data1 10:25:34.010 data2
data1 10:25:34.011 data2
这样做是在列表中循环,如果元素是有效的&#39;将其插入到to_sort列表中。如果不是,请将其附加到最后一个有效。
如果您的列表没有有效的第一个元素,我不确定应该做什么 - 它会在那个场景中中断。
每个元素看起来像:
[
'10:25:34.008',
[
'data1',
'10:25:34.008',
'data2'
],
[
'data1',
'23:59:59:999',
'data2'
],
[
'data1',
'23:59:59:999',
'data2'
]
],
这是一个位就像一个哈希 - 我没有使用一个,因为虽然你的密钥是独一无二的,我假设他们不必这样做。
这给出了以下结果:
data1 10:25:34.001 data2
data1 10:25:34.002 data2
data1 10:25:34.003 data2
data1 10:25:34.004 data2
data1 10:25:34.005 data2
data1 Null data2
data1 Null data2
data1 Null data2
data1 10:25:34.006 data2
data1 10:25:34.007 data2
data1 10:25:34.008 data2
data1 23:59:59:999 data2
data1 23:59:59:999 data2
data1 10:25:34.010 data2
data1 10:25:34.011 data2