我有一个文本文件(基本上是一个包含日期,时间戳和一些数据的错误日志),格式如下:
mm/dd/yy 12:00:00:0001
This is line 1
This is line 2
mm/dd/yy 12:00:00:0004
This is line 3
This is line 4
This is line 5
mm/dd/yy 12:00:00:0004
This is line 6
This is line 7
我是Perl的新手,需要编写一个脚本来搜索文件中的时间戳,并合并其中包含相同时间戳的数据。
我期待以上示例的以下输出。
mm/dd/yy 12:00:00:0001
This is line 1
This is line 2
mm/dd/yy 12:00:00:0004
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
完成这项工作的最佳方法是什么?
答案 0 :(得分:4)
我之前必须在一些非常大的文件上执行此任务,并且时间戳没有按顺序排列。我不想把它全部存储在内存中。我通过使用三遍解决方案完成了任务:
这对我的任务来说足够快,我可以让它在我去喝杯咖啡时运行,但是如果你真的很快就需要结果,你可能需要做更多的事情。
use strict; use warnings; use File::Temp qw(tempfile); my( $temp_fh, $temp_filename ) = tempfile( UNLINK => 1 ); # read each line, tag with timestamp, and write to temp file # will sort and undo later. my $current_timestamp = ''; LINE: while( <DATA> ) { chomp; if( m|^\d\d/\d\d/\d\d \d\d:\d\d:\d\d:\d\d\d\d$| ) # timestamp line { $current_timestamp = $_; next LINE; } elsif( m|\S| ) # line with non-whitespace (not a "blank line") { print $temp_fh "[$current_timestamp] $_\n"; } else # blank lines { next LINE; } } close $temp_fh; # sort the file by lines using some very fast sorter system( "sort", qw(-o sorted.txt), $temp_filename ); # read the sorted file and turn back into starting format open my($in), "<", 'sorted.txt' or die "Could not read sorted.txt: $!"; $current_timestamp = ''; while( <$in> ) { my( $timestamp, $line ) = m/\[(.*?)] (.*)/; if( $timestamp ne $current_timestamp ) { $current_timestamp = $timestamp; print $/, $timestamp, $/; } print $line, $/; } unlink $temp_file, 'sorted.txt'; __END__ 01/01/70 12:00:00:0004 This is line 3 This is line 4 This is line 5 01/01/70 12:00:00:0001 This is line 1 This is line 2 01/01/70 12:00:00:0004 This is line 6 This is line 7
答案 1 :(得分:2)
如果日志文件不是太大而无法保留在内存中,则可以保留日期字符串=&gt;的哈希值。文本。像这样:
my %h;
my $cur = "*** No date ***";
while(<>) {
if (m"^(\d\d/\d\d/\d\d \d\d:\d\d:\d\d:\d{4})") {
$cur = $1;
} else {
$h{$cur} .= $_ unless /^\s*$/;
}
}
print "$_\n$h{$_}\n" foreach (sort keys %h);
你要把它保存为t.pl并按照perl t.pl&lt;运行它。 yourlog.txt。 如果需要,调整正则表达式。
答案 2 :(得分:1)
如果输入很大,最好分两个阶段执行此操作:使用单个表创建一个SQLite数据库,该表包含一个表,其中包含时间戳和行的列(可能还有行号和文件名)。然后,您可以以任何方式输出数据。
答案 3 :(得分:0)
考虑这个解决方案......
#!/usr/bin/perl
use strict;
my (%time, $id);
while (<DATA>) {
if ( /^mm/ ... /\n\n/ ) {
chomp;
s/^mm\/dd\/yy\s(.*)// and $id = $1;
next if ( /^mm/ || /^$/ );
push (@{$time{$id}}, $_);
}
}
for my $i ( keys %time ) {
print "mm/dd/yy $i\n";
for my $j ( @{$time{$i}} ) {
print "$j\n";
}
print "\n";
}
__DATA__
mm/dd/yy 12:00:00:0001
This is line 1
This is line 2
mm/dd/yy 12:00:00:0004
This is line 3
This is line 4
This is line 5
mm/dd/yy 12:00:00:0004
This is line 6
This is line 7