Question

我正在编写一个Perl脚本，我需要从垃圾收集日志中捕获一些行并将它们写入文件。

日志位于远程主机上，我使用Net::OpenSSH模块进行连接。

我需要阅读最新的日志文件。

在shell中，我可以使用以下命令找到最新的日志：

cd builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin
ls -lat | grep '.log$' | tail -1

将返回最新日志：

-rw-r--r--   1 load     other    2406173 Jul 11 11:53 18156.stdout.log

所以在Perl中，我希望能够编写一些定位并打开该日志进行阅读的内容。

当我有该日志文件时，我想打印所有时间戳大于指定时间的行。指定的时间戳是从最新的日志消息时间中减去的$Runtime变量。

以下是垃圾收集日志的最后一条消息：

                                      ...

73868.629: [GC [PSYoungGen: 941984K->14720K(985216K)] 2118109K->1191269K(3065984K), 0.2593295 secs] [Times: user=0.62 sys=0.00, real=0.26 secs]
73873.053: [GC [PSYoungGen: 945582K->12162K(989248K)] 2122231K->1189934K(3070016K), 0.2329005 secs] [Times: user=0.60 sys=0.01, real=0.23 secs]

因此，如果$Runtime的值为120秒，我需要打印时间戳（73873.053 - 120）秒内的所有行。

最后我的脚本看起来像这样...

open GARB, ">", "./report/archive/test-$now/GC.txt" or die "Unable to create file: $!";

my $ssh2 = Net::OpenSSH->(
  $pathHost,
  user => $pathUser,
  password => $pathPassword
);
$ssh2->error and die "Couldn't establish SSH connection: ". $ssh2->error; 

# Something to find and open the log file.
print GARB #Something to return certain lines.
close GARB;

我意识到这与this问题有点类似，但我想不出一种方法可以根据我的要求来定制它。非常感谢任何帮助！

Answer 1

找到最新的文件并将其提供给perl：

 LOGFILE=`ls -t1 $DIR | grep '.log$' | head -1`
 if [ -z $LOGFILE ]; then
   echo "$0: No log file found - exiting"
   exit 1;
 fi

 perl myscript.pl $LOGFILE

第一行中的管道列出目录中的文件，仅限名称，在一列中，最近一行;日志文件的过滤器，然后只返回第一个。

我不知道如何将时间戳翻译成我能理解的东西，并进行数学和比较。但总的来说：

$threshold_ts = $time_specified - $offset;
while (<>) {
  my ($line_ts) = split(/\s/, $_, 2);
  print if compare_time_stamps($line_ts, $threshold_ts);
}

写入阈值操作和比较是留给读者的练习。

Answer 2

我认为Net::OpenSSH的页面为此提供了一个非常好的基线：

my ($rout, $pid) = $ssh->pipe_out("cat /tmp/foo") or
  die "pipe_out method failed: " . $ssh->error;

while (<$rout>) { print }
close $rout;

但相反，你想要做一些丢弃工作：

my ($rout, $pid) = $ssh->pipe_out("cat /tmp/foo") or
  die "pipe_out method failed: " . $ssh->error;

my $line;
while (   $line = <$rout> 
      and substr( $line, 0, index( $line, ':' )) < $start 
      ) {}
while (   $line = <$rout> 
      and substr( $line, 0, index( $line, ':' )) <= $start + $duration 
      ) {
    print $line;
}
close $rout;

Answer 3

这是一种未经测试的方法。我没有使用Net::OpenSSH所以可能有更好的方法来做到这一点。我甚至不确定它是否有效。我测试过的解析部分有什么用呢。

use strict; use warnings;
use Net::OpenSSH;

my $Runtime = 120;
my $now = time;
open my $garb, '>', 
  "./report/archive/test-$now/GC.txt" or die "Unable to create file: $!";
my $ssh2 = Net::OpenSSH->(
$pathHost,
  user => $pathUser,
  password => $pathPassword
);
$ssh2->error and die "Couldn't establish SSH connection: ". $ssh2->error;   

# Something to find and open the log file.
my $fileCapture = $ssh2->capture(
  q~ls -lat builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin |grep '.log$' |tail -1~
);
$fileCapture =~ m/\s(.+?)$/; # Look for the file name
my $filename = $1;           # And save it in $filename

# Find the time of the last log line 
my $latestTimeCapture = $ssh2->capture(
  "tail -n 1 builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin/$filename");
$latestTimeCapture =~ m/^([\d\.]+):/;
my $logTime = $1 - $Runtime;

my ($in, $out, $pid) = $ssh2->open2(
  "builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin/$filename");
while (<$in>) {
  # Something to return certain lines.
  if (m/^([\d\.]+):/ && $1 > $logTime) {
    print $garb $_; # Assume the \n is still in there
  }
}

waitpid($pid);

print $garb;
close $garb;

它使用您的ls行使用capture方法查找文件。然后它通过SSH隧道打开一个管道来读取该文件。 $in是我们可以读取的那个管道的文件句柄。

由于我们要逐行处理文件，从顶部开始，我们需要先抓住最后一行来获取最后一个时间戳。这是通过tail和capture方法完成的。

有了这个，我们逐行读取管道。现在这是一个简单的正则表达式（上面使用的相同）。抓住时间戳并将其与我们之前设置的时间（减去120秒）进行比较。如果它更高，print输出文件句柄的行。

docs我们必须在waitpid返回的$pid上使用$ssh2->open2，因此它会占用子流程，因此我们会在关闭输出文件之前执行此操作。< / p>

Answer 4

您需要保留一个包含所有行（更多内存）的累加器，或者多次遍历日志（更多时间）。

使用累加器：

my @accumulated_lines;
while (<$log_fh>) {
    push @accumulated_lines, $_;

    # Your processing to get $Runtime goes here...

    if ($Runtime > $TOO_BIG) {
        my ($current_timestamp) = /^(\d+(?:\.\d*))/;
        my $start_timestamp = $current_timestamp - $Runtime;

        for my $previous_line (@accumulated_lines) {
            my ($previous_timestamp) = /^(\d+(?:\.\d*))/;
            next unless $previous_timestamp <= $current_timestamp;
            next unless $previous_timestamp >= $start_timestamp;
            print $previous_line;
        }
    }
}

或者您可以遍历日志两次，这是类似的，但没有嵌套循环。我假设您的日志中可能有多个这样的跨度。

my @report_spans;
while (<$log_fh>) {
    push @accumulated_lines, $_;

    # Your processing to get $Runtime goes here...

    if ($Runtime > $TOO_BIG) {
        my ($current_timestamp) = /^(\d+(?:\.\d*))/;
        my $start_timestamp = $current_timestamp - $Runtime;

        push @report_spans, [ $start_timestamp, $current_timestamp ];
    }
}

# Don't bother continuing if there's nothing to report
exit 0 unless @report_spans;

# Start over
seek $log_fh, 0, 0;

while (<$log_fh>) {
    my ($previous_timestamp) = /^(\d+(?:\.\d*))/;
    SPAN: for my $span (@report_spans) {
        my ($start_timestamp, $current_timestamp) = @$span;

        next unless $previous_timestamp <= $current_timestamp;
        next unless $previous_timestamp >= $start_timestamp;
        print; # same as print $_;

        last SPAN; # don't print out the line more than once, if that's even possible
    }
}

如果您可能有重叠的跨度，后者的优点是不会两次显示相同的日志行。如果没有重叠跨度，则可以通过每次输出时重置累加器来优化最高跨度：

my @accumulator = ();

可以节省内存。

Answer 5

使用SFTP访问远程文件系统。您可以使用Net::SFTP::Foreign（单独或通过Net :: OpenSSH）。

它将允许您列出远程文件系统的内容，选择要处理的文件，打开它并将其作为本地文件进行操作。

您需要做的唯一棘手的事情是向后读取行，例如从末尾开始读取文件的块并将它们分成行。

打开最新的日志文件，并在某个时间戳之后打印行

5 个答案: