将这个问题归结为25年前我所知道的事情,并忘了......
我有来自Windows事件日志的日志输出,并且无法控制时间戳格式(如果我这样做,我会选择像YYYYMMDD HH24MMSS那样合理的东西,这样当它被视为字符串时它很容易排序。
我确信使用sed或某种排序参数可以轻松实现此目的。有没有人有这方面的快速解决方案?
示例数据:
SERVER01,1/1/2013 12:00:01 AM,8,FOO,TOO
SERVER01,4/10/2012 4:43:06 PM,8,FOO,TOO
SERVER01,4/11/2012 4:43:06 PM,8,FOO,TOO
SERVER01,4/9/2012 4:43:06 PM,8,FOO,TOO
SERVER02,12/31/2012 11:59:59 PM,8,FOO,TOO
SERVER02,4/10/2012 4:43:06 PM,8,FOO,TOO
SERVER02,4/9/2012 4:43:06 PM,8,FOO,TOO
所需订单:
SERVER01,4/9/2012 4:43:06 PM,8,FOO,TOO
SERVER02,4/9/2012 4:43:06 PM,8,FOO,TOO
SERVER01,4/10/2012 4:43:06 PM,8,FOO,TOO
SERVER02,4/10/2012 4:43:06 PM,8,FOO,TOO
SERVER01,4/11/2012 4:43:06 PM,8,FOO,TOO
SERVER02,12/31/2012 11:59:59 PM,8,FOO,TOO
SERVER01,1/1/2013 12:00:01 AM,8,FOO,TOO
重新格式化时间戳是正常的,甚至是可取的。我只是不知道如何。 这需要在Windows上运行,我可以使用Cygwin(我已经在同一个文件上使用它进行grep过滤)。
答案 0 :(得分:2)
这是一个Perl脚本,它为每行添加了一个可排序的时间戳:
#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
my $timestamp = (split /,/)[1];
my($mon, $mday, $year, $hour, $min, $sec, $ampm) =
$timestamp =~ m{^(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+)\s+(AM|PM)};
die "Can't parse timestamp on line $.\n" if not defined $ampm;
if ($ampm eq 'AM') {
$hour = 0 if $hour == 12;
}
else {
$hour += 12 if $hour != 12;
}
printf "%04d-%02d-%02d %02d:%02d:%02d,%s",
$year, $mday, $mon, $hour, $min, $sec, $_;
}
对于样本数据,它会产生以下输出:
2013-01-01 00:00:01,SERVER01,1/1/2013 12:00:01 AM,8,FOO,TOO
2012-10-04 16:43:06,SERVER01,4/10/2012 4:43:06 PM,8,FOO,TOO
2012-11-04 16:43:06,SERVER01,4/11/2012 4:43:06 PM,8,FOO,TOO
2012-09-04 16:43:06,SERVER01,4/9/2012 4:43:06 PM,8,FOO,TOO
2012-31-12 23:59:59,SERVER02,12/31/2012 11:59:59 PM,8,FOO,TOO
2012-10-04 16:43:06,SERVER02,4/10/2012 4:43:06 PM,8,FOO,TOO
2012-09-04 16:43:06,SERVER02,4/9/2012 4:43:06 PM,8,FOO,TOO
要按日期对样本数据进行排序,假设上述Perl脚本为foo.pl
:
./foo.pl in.txt | sort | sed 's/^[^,]*,//'
这会产生与您问题中指定输出相同的输出。
如果您愿意,可以对Perl脚本进行一些小调整,以避免sort
和sed
命令,但代价是在整个内存中存储,修改和排序,这可能是输入非常大的问题:
#!/usr/bin/perl
use strict;
use warnings;
my @lines = ();
while (<>) {
my $timestamp = (split /,/)[1];
my($mon, $mday, $year, $hour, $min, $sec, $ampm) =
$timestamp =~ m{^(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+)\s+(AM|PM)};
die "Can't parse timestamp on line $.\n" if not defined $ampm;
if ($ampm eq 'AM') {
$hour = 0 if $hour == 12;
}
else {
$hour += 12 if $hour != 12;
}
push @lines, sprintf "%04d-%02d-%02d %02d:%02d:%02d,%s",
$year, $mday, $mon, $hour, $min, $sec, $_;
}
@lines = sort @lines;
foreach (@lines) {
s/^[^,]*,//;
}
print @lines;
答案 1 :(得分:2)
awk绝对带有cygwin,它可以将日期转换为可排序的格式到行的前面(我要退出新手到awk,所以这很难看我确定但是它有效),所以那么您可以将日志记录到此脚本中,然后进入简单的排序
#! /bin/awk -f
BEGIN {
FS=",";
}
{
linedate=$2;
split(linedate,datetime," ");
split(datetime[1],datepieces,"/");
date=sprintf( "%d/%02d/%02d", datepieces[3], datepieces[1], datepieces[2]);
split(datetime[2],timepieces,":");
time=sprintf( "%02d:%02d:%02d", timepieces[1], timepieces[2], timepieces[3] );
print date " " time " " datetime[3] "," $1 "," $3 "," $4 "," $5;
}
答案 2 :(得分:1)
我不得不做这样的事情 - 我有多个日志文件需要合并log4j时间戳。
我决定采用的解决方案是使用gawk
将时间戳转换为毫秒 - 从纪元开始,并为所有行添加前缀。之后使用sort
很简单。
我转换为以上格式,因为我还想对t9imestamp值进行一些算术运算。您可以使用快捷方式转换为yymmddXhhmmss
中的sed
。 X
用于am/pm
0
am
1
pm
进一步考虑,使用gawk
而不是sed
也会更好,这样您就可以使用printf
获得零填充数字。
答案 3 :(得分:1)
试试这个unix命令。
我只为时间戳部分做过。
输入
1/1/2013 12:00:01 AM
4/10/2012 4:43:06 PM
4/9/2012 4:43:06 PM
12/31/2012 11:59:59 PM
4/10/2012 4:43:06 PM
4/9/2012 4:43:06 PM
Unix命令
$>sort -t "/" -k 1.8,1.4 Input| sort -t ":" -r -k 1 -k 2.1,2.2 -k 3.1,3.2 | sort -t " " -r -k 3.1
<强>输出强>
4/9/2012 4:43:06 PM
4/9/2012 4:43:06 PM
4/10/2012 4:43:06 PM
4/10/2012 4:43:06 PM
12/31/2012 11:59:59 PM
1/1/2013 12:00:01 AM
您可以根据自己的要求修改脚本。
答案 4 :(得分:1)
克里斯,
您可能需要使用下面提供的代码,特别是查看sort命令。
我编写的awk脚本清理Windows Server 2003“timestamp”,以便用0填充单个数字。可以很容易地改变生成的理智时间戳的格式。
应该使用默认的cygwin安装。
让我知道您的想法,可能需要一些推文,我很乐意根据您的反馈做。
罗布
$ gawk -f foo.awk event_log.txt | sort -n -k2
SERVER01,04/09/2012 04:43:06 PM,8,FOO,TOO
SERVER01,04/10/2012 04:43:06 PM,8,FOO,TOO
SERVER01,04/11/2012 04:43:06 PM,8,FOO,TOO
SERVER02,04/09/2012 04:43:06 PM,8,FOO,TOO
SERVER02,04/10/2012 04:43:06 PM,8,FOO,TOO
SERVER02,12/31/2012 11:59:59 PM,8,FOO,TOO
SERVER01,01/01/2013 12:00:01 AM,8,FOO,TOO
foo.awk在哪里
BEGIN { FS = "," }
{ print $1"," prepadDate($2) "," $3 "," $4 "," $5 }
#
# Returnes a useful timestamp given the timestamp received in event logs on Windows Server 2003
#
function prepadDate(winSrvr2003ts) {
padded_day = ""
padded_month = ""
year = ""
padded_date = ""
split(winSrvr2003ts,numbers," ")
split(numbers[1], date, "/")
split(numbers[2], time, ":")
antePostMeridian = numbers[3]
padded_day = prePadAZero(date[1])
padded_month = prePadAZero(date[2])
year = date[3]
padded_hour = prePadAZero(time[1])
minute = time[2]
seconds = time[3]
#
# Alter the return statememnt to format the timestamp according to your needs
# rememebering that string concatenation in gawk is simply a space.
#
return padded_day "/" padded_month "/" year " " padded_hour ":" minute ":" seconds " " antePostMeridian
}
#
# Prepend a zero to number if it is a single digit
#
function prePadAZero(number){
if (length(number) == 1)
padded = "0" number
else
padded = number
return padded
}