您好我正在对文件进行排序,我希望通过将类似的行合并在一起来使其更容易阅读。数据已根据每行的第一个单词进行了基本排序。到目前为止,我的程序只读取数组的行并打印出来。
文字文件包含:
Network ubuntu Jan 1 13:42:13 : <info> DHCP: device eth5 state changed bound -> renew Network
Network ubuntu Jan 2 13:42:42 : <info> prefix 24 (255.255.255.0) Network
Network ubuntu Jan 2 12:11:42 : <info> DHCP: device eth5 state changed bound -> renew Network
testing ubuntu Jan 1 01:13:42 : DHCPACK of 192.168.233.129 from 192.168.233.254 testing
testing ubuntu Jan 2 13:54:42 : DHCPACK of 192.168.233.129 from 192.168.233.254 testing
testing ubuntu Jan 3 13:02:42 : DHCPACK of 192.168.233.129 from 192.168.233.254 testing
我的计划:
#!/usr/bin/perl
$FILE = '/computer/testfile.txt';
open(INFO, $FILE);
while($line = <INFO>){
push(@array, $line);
}
print @array;
我想使用正则表达式合并任何相同的行,不包括日期/时间戳。结果是包含括号中合并的行数,后跟最早和最新的日期/时间戳。如果没有相似的行,则忽略该行,使其保持不变。
预期的最终结果
Network ubuntu Jan 2 13:42:42 : <info> prefix 24 (255.255.255.0) Network
Network ubuntu (2) Jan 1 13:42:13-Jan 2 12:11:42: <info> DHCP: device eth5 state changed bound -> renew Network
testing ubuntu (3) Jan 1 01:13:42-Jan 3 13:02:42 : DHCPACK of 192.168.233.129 from 192.168.233.254 testing
非常感谢任何帮助或指导。感谢
答案 0 :(得分:1)
您可以使用Time::Piece来解析日期。请注意,没有年份,您无法对时间戳进行排序。
只需逐行阅读已排序的文件。如果信息与前一行相同,则累加时间戳,否则输出先前累积的信息并开始累积新信息。
#!/usr/bin/perl
use warnings;
use strict;
use Time::Piece;
sub output {
my ($pre, $post, @timestamps) = @_;
if (@timestamps > 1) {
@timestamps = map $_->[0], # Use Schwartzian Transform to sort by timestamp.
sort { $a->[1] <=> $b->[1] }
map [ $_, 'Time::Piece'->strptime($_, '%b %d %H:%M:%S') ],
@timestamps;
print "$pre (", scalar @timestamps, ") ",
$timestamps[0], '-', $timestamps[-1],
$post, "\n";
} else {
print "$pre$timestamps[0]$post\n";
}
}
my @last;
my @timestamps;
while (<>) {
my ($pre, $timestamp, $post)
= /(.*?) ([ADFJMNOS][aceopu][bcglnprtvy]\s+[0-9]+\s[0-9:]+) (.*)/x;
if (@last and $pre eq $last[0] and $post eq $last[1]) {
push @timestamps, $timestamp;
} else {
output(@last, @timestamps) if @timestamps;
@last = ($pre, $post);
@timestamps = ($timestamp);
}
}
output(@last, @timestamps); # Don't forget to output the last batch.