perl sort with wrap around support question

时间:2011-06-07 15:17:26

标签: perl sorting awk

这是对我之前"perl sort question"的跟进 在那里提出的最终解决方案工作正常,但我想要排序的日志有几个环绕排序键的出现。在这种情况下,我不希望以0x000xxxx开头的下一段日志转到输出文件的顶部,请参阅下面的输入示例和所需的输出。请注意,排序键可以在2个不同的段中重复,并且段边界不容易找到,一些0xffxxxxx条目将在一堆0x000xxxx之后弹出。有任何想法吗?请参阅"perl sort question"以供参考。没有包装支持的当前代码是

#!/usr/bin/perl -w
my $line;
my $lastkey;
my %data;
while($line = <>) {
  chomp $line;
  if ($line =~ /\b(0x\p{AHex}{8})\b/) {
    # Begin a new entry
    #my $unique_key = $1 . $.; # cred to [Brian Gerard][3] for uniqueness
    my $unique_key = hex($1);
    $data{$unique_key} = $line;
    $lastkey = $unique_key;
  } else {
    # Continue an old entry
    $data{$lastkey} .= $line;
  }
}
print $data{$_}, "\n" for (sort { $a <=> $b } keys %data);

我要实现的输入/输出样本

**input sample**
[2011-05-30 0xfff7ecf9=(bfn:4095,
[2011-05-30 0xfff80176=(bfn:4095,
[2011-05-30 0xfff8db3a=(bfn:4095,
[2011-05-30 0x00005686=(bfn:0,
[2011-05-30 0x00006b05=(bfn:0,
[2011-05-30 0xfff8c698=(bfn:4095,
[2011-05-30 0x00014692=(bfn:0,
[2011-05-30 0x00026537=(bfn:0,
[2011-05-30 0xfff80215=(bfn:4095,
[2011-05-30 0x00026f87=(bfn:0,
[2011-05-30 0x00027754=(bfn:0,

< thousands of lines from 0x000xxxxx to 0xfffxxxxx>

<next wrap zone>
[2011-05-30 0xfff709b4=(bfn:4095,
[2011-05-30 0xfff804f5=(bfn:4095,
[2011-05-30 0x00015af8=(bfn:0,
[2011-05-30 0x00016744=(bfn:0,
[2011-05-30 0xfff8e783=(bfn:4095,
[2011-05-30 0x00007744=(bfn:0,
[2011-05-30 0x0002368c=(bfn:0,
[2011-05-30 0x00024d0d=(bfn:0,
[2011-05-30 0x000326ae=(bfn:0,
[2011-05-30 0x00034ff3=(bfn:0,

< thousands of lines from 0x000xxxxx to 0xfffxxxxx>
< and so on >

 **desired output**

[2011-05-30 0xfff7ecf9=(bfn:4095,
[2011-05-30 0xfff80176=(bfn:4095,
[2011-05-30 0xfff80215=(bfn:4095,
[2011-05-30 0xfff8c698=(bfn:4095,
[2011-05-30 0xfff8db3a=(bfn:4095,
[2011-05-30 0x00005686=(bfn:0,
[2011-05-30 0x00006b05=(bfn:0,
[2011-05-30 0x00014692=(bfn:0,
[2011-05-30 0x00026537=(bfn:0,
[2011-05-30 0x00026f87=(bfn:0,
[2011-05-30 0x00027754=(bfn:0,

< thousands of sorted lines from 0x000xxxxx to 0xfffxxxxx>

[2011-05-30 0xfff709b4=(bfn:4095,
[2011-05-30 0xfff804f5=(bfn:4095,
[2011-05-30 0xfff8e783=(bfn:4095,
[2011-05-30 0x00007744=(bfn:0,
[2011-05-30 0x00015af8=(bfn:0,
[2011-05-30 0x00016744=(bfn:0,
[2011-05-30 0x0002368c=(bfn:0,
[2011-05-30 0x00024d0d=(bfn:0,
[2011-05-30 0x000326ae=(bfn:0,
[2011-05-30 0x00034ff3=(bfn:0,

and so on

Sample of log out of order

[2011-06-06 20:15:48.058200] 0xefe29556=(bfn:3838, sfn:766, sf:2.73, bf:85) / BIN_SEND :  (402) <=  UNKNOWN (sessionRef=0x2)
testSign {
  sigNo = 352785671
  transactionNo = 39027
  cellId = 0
  yrdty = 0
}

[2011-06-06 20:15:48.058468] 0xefe2d262=(bfn:3838, sfn:766, sf:3.00, bf:38) / BIN_REC : BB_SWU_INTERNAL_TIMEOUT2_IND (43) <=   (sessionRef=0x0)
0000 00 00 00 20 00 00 00 3e 00 08 00 05 00 01 00 01  '... ...>........'
0010 00 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00  '................'
0020 00 0f 00 00 05 06 05 07  '........'
(Unknown signal BB_SWU_INTERNAL_TIMEOUT2_IND)



[2011-06-06 20:15:48.058669] 0xefe:30316=(bfn:3838, sfn:766, sf:3.20, bf:49) 1/ BIN_REC :  (525) <=  UNKNOWN (sessionRef=0xa67b0)
testSign {
  sigNo = 23070220
  header {
    cellId = 0
   sfn = 766
   subFrameNo = 2
  }

  reportList[0] {
 {
   = 1
  bbef = 32 (0x00000020)
  isDtx { isDtx = 0 }
   {  = 1,  = 0,  = 3, nrOfTb = 1, padding0 = 0 }
  rxPower { prbListStart = 0, prbListEnd = 0, rxPowerReport = -1146, sinr = 64 }
  timingAdvanceError { timingAdvanceError = 0 }
  cfrPucch { cfrInfo { ri = 0, cfrLength = 0, cfrFormat = 0, cfrValid = 0, cfrExpected = 0, cfrCrcFlag = 0 }, cfr[] = [0, 0] as hex: [00 00 00 00] }
  }
 } 
}

[2011-06-06 20:15:48.055118] 0xefd91f8b=(bfn:3837, sfn:765, sf:9.67, bf:248) 4/_hInd LEVEL3 .c:1035: <!68!> cellId=0 subframeNr=1 : Combinded pdcchInd DL: pdcch=0: rnti=62 cceIndex=0 nrOfCce=8 nrOfRbaBits=25 startRbaBit=1 rbaBits=4294967168 dciFormat=6 nrOfPayloadBit=26 dciMsg={0x2300 0x7a80} =6 swapFlag=0 mcs={0 29} rv={1 2} ndi={0 0} pucchTpc=1
[2011-06-06 20:15:48.057932] 0xefe2586b=(bfn:3838, sfn:766, sf:2.47, bf:134) 4/ LEVEL2 .c:320: <!.118!> cellId=0 =20 subframeNr=4 : Assigned SE PQ : rnti=62 PQ(lcid=3 pqWeight=16530951 assignableBits=2664560 minPduSize=56)
 [2011-06-06 20:15:48.057932] 0xefe25a28=(bfn:3838, sfn:766, sf:2.47, bf:162) 4/_hInd LEVEL2 gchind.c:81: <!.19!> cellId=0 Receive UL PdcchInd - msg cellNo=0 msg subframe=0 dl subframe=4 msg len=4
 [2011-06-06 20:15:48.058066] 0xefe271d9=(bfn:3838, sfn:766, sf:2.60, bf:29) 4/_hInd LEVEL2 ce_schedsession_ovll1transjobready.c:149: <!.112!> L1Trans is ready and Ul pdcchInd is available -launching combinePdcch FO
 [2011-06-06 20:15:48.058066] 0xefe273b0=(bfn:3838, sfn:766, sf:2.60, bf:59) 4/ LEVEL3 ce_l1transfo.c:829: <!.76!> cellId=0 gRef=20 : Selected SE and PQ: rnti=62 PQ=1 lcid=1 assignableBits=0 assignedBits={0 0} minPduSize=56
 [2011-06-06 20:15:48.058066] 0xefe2744b=(bfn:3838, sfn:766, sf:2.60, bf:68) 4/ LEVEL3 ce_l1transfo.c:829: <!.76!> cellId=0 gRef=20 : Selected SE and PQ: rnti=62 PQ=2 lcid=2 assignableBits=0 assignedBits={0 0} minPduSize=56
 [2011-06-06 20:15:48.058066] 0xefe276b1=(bfn:3838, sfn:766, sf:2.60, bf:107) 4/_hInd LEVEL2 .c:857: <!.xxb!> tempNBundle=0 [0]=0 tempStoredNbundled[1]=9385 tempStoredNbundled[2]=0 tempStoredNbundled[3]=10698 dai=3, dciFormat=6, gRef=20

1 个答案:

答案 0 :(得分:3)

像这样的东西。我试着写一个描述,但是代码比最终的描述更清晰,我想。

#!/usr/bin/perl -w
use strict;
my $lastkey;
my %data;

# how much can the value can be less than the previous seen maximum
# before it's considered to have jumped forward (where "less" and
# "maximum" are cognizant of wrap-around)
my $max_overlap = 0x40000000;

# we will process the file in chunks, reading while the values
# are between min_value and max_value, and then processing what
# we've read that's not within the max overlap of the new value
my $min_value;
my $max_value;
my $first_line = <>;
if ( defined $first_line && $first_line =~ /\b(0x\p{AHex}{8})\b/ ) {
    my $unique_key = hex($1);
    ($min_value, $max_value) = range_around($unique_key, $max_overlap);

    $data{$unique_key} = $first_line;
    $lastkey = $unique_key;
}
else {
    die "horribly";
}

while(my $line = <>) {
    if ($line =~ /\b(0x\p{AHex}{8})\b/) {
        # Begin a new entry
        my $unique_key = hex($1);

        unless (in_range($unique_key, $min_value, $max_value)) {
            # we've reached the end of a hunk; sort and output as much as we safely can
            my ($new_min_value, $new_max_value) = range_around($unique_key, $max_overlap);

            my @sorted =
                sort { $b>=$min_value <=> $a>=$min_value || $a <=> $b }
                grep in_range($_, $min_value, $new_min_value), keys %data;
            print delete @data{@sorted};

            ($min_value, $max_value) = ($new_min_value, $new_max_value);
        }

        $data{$unique_key} .= $line;
        $lastkey = $unique_key;
    } else {
        # Continue an old entry
        $data{$lastkey} .= $line;
    }
}

print @data{ sort { $b>=$min_value <=> $a>=$min_value || $a <=> $b } keys %data };

# check if value is between min and max, where max may be wrapped
sub in_range {
    my ($value, $min, $max) = @_;
    if ($max > $min) {
         return $value >= $min && $value <= $max;
    }
    else {
         return $value >= $min || $value <= $max;
    }
}

# calculate a range a given increment around a value
sub range_around {
    my ($value, $increment) = @_;
    my $min_value = $value - $increment;
    if ($min_value < 0) { ($min_value += 0xffffffff) +=1 }
    my $max_value = $value + $increment;
    if ($max_value > 0xffffffff) { ($max_value -= 0xffffffff) -= 1 }
    return ($min_value, $max_value);
}

我摆脱了你的chomp和换行;它们似乎是不必要的(在多行记录的情况下甚至是不需要的)。

一点解释:

sort { $b>=$min_value <=> $a>=$min_value || $a <=> $b } grep in_range($_, $min_value, $new_min_value), keys %data;

这一行遍历%data的键(它们是段号)并用grep过滤它们,只接受我们知道输出已经安全的范围(因为基于记录我们刚刚阅读(此时本身不会输出),我们知道标识符的排序不会低于$new_min_value)。

然后对它们进行排序。 sort可以给出一个比较块来覆盖默认的基于字符串比较的排序;重复调用它来比较列表中的两个元素,$a$b别名为元素,并且应该返回-1,0或1,具体取决于哪个应该先排序,就像cmp<=>运算符一样。

在这里,它使用两个比较进行排序。第一个检查是否有一个或两个被比较的数字已经被包围(通过直接将它们与范围的最小值进行比较;从0xfff ...回到0x000的那些...将小于{{ 1}})。如果只有$min_value被包围,则第一个$a将为<=>,给出1,表示1 <=> 0的排序时间晚于$a。如果只有$b已经缠绕,则$b将为<=>,为-1,表示0 <=> 1的排序时间晚于$b。在任何一种情况下,$a都被短路,因此不进行第二次比较。当||$a都没有缠绕时,$b将分别为<=>1 <=> 1,返回0,从而导致{{1}尝试第二次比较,它只是确定0 <=> 0||是否在数字上。