我有两个日志/时间格式的日志文件,我想合并。
第一个文件是标准的Apache access_log文件,如下所示:
127.0.0.1 - - [29 / Feb / 2016:16:57:52 -0600]" GET / application / wcs / api / version?nodeRef = workspace:// SpacesStore / ecd62cfa-fd19-4d6b -b45d-14f0e5b92cf0 HTTP / 1.1" 200 567
127.0.0.1 - - [29 / Feb / 2016:16:57:52 -0600]" GET / application / wcs / api / node / workspace / SpacesStore / ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 / workflow-实例HTTP / 1.1" 200 40
127.0.0.1 - - [29 / Feb / 2016:16:57:52 -0600]" GET / application / wcs / cisco / appId?userId = abcdefg& requestType = get HTTP / 1.1" 200 45
173.37.239.93 - abcdefg [29 / Feb / 2016:16:57:52 -0600]" GET / share / page / site / nextgen-edcs / document-details?nodeRef = workspace:// SpacesStore / ecd62cfa- fd19-4d6b-b45d-14f0e5b92cf0 HTTP / 1.1" 200 124492
173.37.239.93 - abcdefg [29 / Feb / 2016:16:57:53 -0600]" GET /share/service/messages_69bcdfdb058bb873ff49cc2a10c958b7.js?locale=en_US HTTP / 1.1" 200 81698
173.37.239.93 - abcdefg [29 / Feb / 2016:16:57:53 -0600]" GET /share/res/yui/history/history_543b42a00a378f4d4b6e70c81d915b0a.js HTTP / 1.1" 200 5781
。 。 。在哪里' abcdedfg' = userid。
第二个日志文件的格式如下:
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212]群集处于非活动状态,但是为缓存调用了put(k,v)HazelcastSimpleCache [cacheName = cache.readersSharedCache]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212]群集处于非活动状态但是调用了get(key)缓存HazelcastSimpleCache [cacheName = cache.readersSharedCache], key = AclEntity [ID = 1893033,version = 55,aclId = 16cf5bc3-27d0-4d50-a93d-3bee1ddd112e,isLatest = true,aclVersion = 1,inherits = true,inheritsFrom = 1889292,type = 1,inheritedAcl = 1893034,isVersioned = false,requiresVersion = false,aclChangeSet = 1451473]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212]群集处于非活动状态,但是为缓存调用了put(k,v)HazelcastSimpleCache [cacheName = cache.readersSharedCache ]
我的目标是:
这是我到目前为止所拥有的 -
$LOGFILE1 = "catalina.out";
$LOGFILE2 = "access_log";
open(LOGFILE1) or die("Could not open log file.");
foreach $line (<LOGFILE1>) {
chomp($line);
if ($line =~ /^2016.+$/) {
print $line . "\n";
}
}
open(LOGFILE2) or die("Could not open log file.");
foreach $line (<LOGFILE2>) {
chomp($line);
if ($line =~ /\d{2}\/\S{3}\/\d{4}:\d{2}:\d{2}:\d{2} -\d{3}/) {
print $line . "\n";
}
# format of file 1
# DD/MMM/YYYY:HH:MM:SS -NNNN
# 29/Feb/2016:20:03:07 -600
# format of file 2
# YYYY-MM-DD HH:MM:SS,NNN
# 2016-02-12 08:16:03,631
}
所以我基本上只对有日期/时间信息的行感兴趣,所以上面的代码丢弃了其他行。
我被困的地方是:
1)如何将文件1中的日期/时间格式转换为文件2的数据/时间格式?
2)我对IP地址不感兴趣,但我确实希望保留用户ID。由于文件1不以文件2之类的日期/时间信息开头,因此在转换后,如何在合并两者之后对日期进行排序?
任何帮助将不胜感激!
答案 0 :(得分:0)
虽然我不会为您编写脚本,但通用脚本应该如下所示:
use strict;
use warnings;
use DateTime::Format::Strptime;
sub firstFileLine {
# parse line as needed, and return a hash reference with 2 keys:
# 1. `line`: the contents of the line, possibly edited
# 2. `ts`: the UTC unix timestamp, via the DateTime::Format::Strptime module
}
sub secondFileLine {
# similar to `firstFileLine`, return a hash reference
}
my @firstLines = map { firstFileLine($_) } <FILE1>;
my @secondLines = map { secondFileLine($_) } <FILE2>;
my @sorted = map { $_->{line} } sort {$a->{ts} <=> $b->{ts}} (@firstLines, @secondLines);
阅读DateTime::Format::Strptime,map和sort上的文档。你很幸运Perl是那里记录最好的语言之一,充分利用这一事实!
答案 1 :(得分:0)
以下是使用Time::Piece的解决方案。我使用Inline :: Files来模拟2个文件。你需要打开像
这样的日志文件my $logfile1 = "catalina.out";
my $logfile2 = "access_log";
open my $log1_fh, '<', $logfile1 or die $1;
open my $log2_fh, '<', $logfile2 or die $1;
程序看起来像这个,它给了我我想你想要的结果。
#!/usr/bin/perl
use strict;
use warnings;
use Inline::Files;
use Time::Piece;
my %data;
while (<FILE2>) {
# get date_time
my ($dt) = /^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d),/ or next;
push @{ $data{$dt} }, $_;
}
my $format = '%d/%b/%Y:%H:%M:%S';
while (<FILE1>) {
/\[(\S+)/;
my $t = Time::Piece->strptime($1, $format)
or die "Cannot parse $1. $!";
my $dt = $t->strftime('%Y-%m-%d %H:%M:%S');
s/^\S+ (?:- )+//;
s/(?<=\[)[^\]]+/$dt/;
push @{ $data{$dt} }, $_;
}
for my $dt (sort keys %data) {
my $aref = $data{$dt};
print for @$aref;
}
__FILE1__
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/api/version?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 567
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/api/node/workspace/SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0/workflow-instances HTTP/1.1" 200 40
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/cisco/appId?userId=abcdefg&requestType=get HTTP/1.1" 200 45
173.37.239.93 - abcdefg [29/Feb/2016:16:57:52 -0600] "GET /share/page/site/nextgen-edcs/document-details?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 124492
173.37.239.93 - abcdefg [29/Feb/2016:16:57:53 -0600] "GET /share/service/messages_69bcdfdb058bb873ff49cc2a10c958b7.js?locale=en_US HTTP/1.1" 200 81698
173.37.239.93 - abcdefg [29/Feb/2016:16:57:53 -0600] "GET /share/res/yui/history/history_543b42a00a378f4d4b6e70c81d915b0a.js HTTP/1.1" 200 5781
__FILE2__
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but get(key) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache], key=AclEntity[ ID=1893033, version=55, aclId=16cf5bc3-27d0-4d50-a93d-3bee1ddd112e, isLatest=true, aclVersion=1, inherits=true, inheritsFrom=1889292, type=1, inheritedAcl=1893034, isVersioned=false, requiresVersion=false, aclChangeSet=1451473]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache]
我使用散列%data
来存储这些行。关键是转换日期,所以稍后在程序中,您可以按排序顺序打印它们。
该程序的输出是:
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but get(key) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache], key=AclEntity[ ID=1893033, version=55, aclId=16cf5bc3-27d0-4d50-a93d-3bee1ddd112e, isLatest=true, aclVersion=1, inherits=true, inheritsFrom=1889292, type=1, inheritedAcl=1893034, isVersioned=false, requiresVersion=false, aclChangeSet=1451473]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache]
[2016-02-29 16:57:52] "GET /application/wcs/api/version?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 567
[2016-02-29 16:57:52] "GET /application/wcs/api/node/workspace/SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0/workflow-instances HTTP/1.1" 200 40
[2016-02-29 16:57:52] "GET /application/wcs/cisco/appId?userId=abcdefg&requestType=get HTTP/1.1" 200 45
abcdefg [2016-02-29 16:57:52] "GET /share/page/site/nextgen-edcs/document-details?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 124492
abcdefg [2016-02-29 16:57:53] "GET /share/service/messages_69bcdfdb058bb873ff49cc2a10c958b7.js?locale=en_US HTTP/1.1" 200 81698
abcdefg [2016-02-29 16:57:53] "GET /share/res/yui/history/history_543b42a00a378f4d4b6e70c81d915b0a.js HTTP/1.1" 200 5781