递归地使用awk命令比较两个文件

时间:2014-04-28 13:07:48

标签: shell awk

我想比较两个文件, 1)比较每个查询结果。 2)仅比较查询输出的第一行 3)比较时间(第3列),第一个文件时间小于第2个文件然后打印PO_NUM否则什么都不做。

File1中:

C:\script>call transaction 1OPOP

C:\script>Select ID, PO_ID, TIME, DES From Table
ID          PO_NUM          TIME                        DES         
-------     ------------    ---------------             -----
11232323    1OPOP           2012-08-01-23.02.50.040000  SAMPLE  
11232324    1OPOP           2013-09-01-23.02.50.040000  SAMPLE  
11232325    1OPOP           2014-09-01-23.02.50.040000  SAMPLE  
11232326    1OPOP           2015-09-01-23.02.50.040000  SAMPLE
4 record(s) selected.

C:\script>call transaction 1XDXD

C:\script>Select ID, PO_ID, TIME, DES From Table
ID          PO_NUM          TIME                        DES         
-------     ------------    ---------------             -----
11232323    1XDXD           2012-07-01-23.02.50.040000  SAMPLE  
11232324    1XDXD           2013-09-01-23.02.50.040000  SAMPLE  
11232325    1XDXD           2014-08-01-23.02.50.040000  SAMPLE  
3 record(s) selected.

C:\script>call transaction 1IOIO

C:\script>Select ID, PO_ID, TIME, DES From Table
ID          PO_NUM          TIME                        DES         
-------     ------------    ---------------             -----
11232323    1IOIO           2011-06-01-23.02.50.040000  SAMPLE  
11232324    1IOIO           2012-09-01-23.02.50.040000  SAMPLE  
2 record(s) selected.

文件2:

C:\script>call transaction 1OPOP

C:\script>Select ID, PO_ID, TIME, DES From Table
ID          PO_NUM          TIME                    DES         
-------     ------------    ---------------             -----
11232323    1OPOP           2012-09-01-23.02.50.040000  SAMPLE  
11232324    1OPOP           2013-09-01-23.02.50.040000  SAMPLE  
11232325    1OPOP           2014-09-01-23.02.50.040000  SAMPLE  
11232326    1OPOP           2015-09-01-23.02.50.040000  SAMPLE
4 record(s) selected.

C:\script>call transaction 1XDXD

C:\script>Select ID, PO_ID, TIME, DES From Table
ID          PO_NUM          TIME                    DES         
-------     ------------    ---------------             -----
11232323    1XDXD           2012-08-01-23.02.50.040000  SAMPLE  
11232324    1XDXD           2013-09-01-23.02.50.040000  SAMPLE  
11232325    1XDXD           2014-08-01-23.02.50.040000  SAMPLE  
3 record(s) selected.

C:\script>call transaction 1IOIO

C:\script>Select ID, PO_ID, TIME, DES From Table
ID          PO_NUM          TIME                DES         
-------     ------------    ---------------             -----
11232323    1IOIO           2011-05-01-23.02.50.040000  SAMPLE  
11232324    1IOIO           2012-09-01-23.02.50.040000  SAMPLE  
2 record(s) selected.   

2 个答案:

答案 0 :(得分:0)

您可以尝试以下Perl脚本:

#! /usr/bin/perl

use v5.12;

use File::Slurp qw(read_file);
use Time::Piece;

my $fmt='%Y-%d-%m-%H.%M.%S';
my @files=qw(file1 file2);
my @data;
for my $file (@files) {
    my $str=read_file($file);
    my @a=$str=~/-----\s*\n(.*?)\n\d+ record/sg;
    my $i=0;
    for (@a) {
        my $line=@{[split (/\n/)]}[0];
        my @fld=split(" ",$line);
        my $d=$fld[2];
        my ($date)=$d=~/(.*)\.[^.]*$/;
        $data[$i]={} if ! defined $data[$i];
        $data[$i]->{$file}{time}=Time::Piece->strptime($date, $fmt);
        $data[$i]->{id}=$fld[1];
        $i++;
    }      
}
for my $row (@data) {
    my $t1=$row->{file1}->{time};
    my $t2=$row->{file2}->{time};
    if ($t1>$t2) {
        say $row->{id};
    }
}

答案 1 :(得分:0)

一个例子:

awk '/^---/ {b=1;next} b==1{if(NR==FNR) a[$2]=$3; else if(a[$2]<$3) print $2; b=0}' file1 file2
1OPOP
1XDXD

以下是细分:

  • 当行以&#34; ---&#34;
  • 开头时设置b = 1
  • 当b == 1期望处理第一行数据时
  • a[]下的时间存储在file1的PO_NUM下,或者将其与文件2中的时间进行词汇表比较并相应地进行打印。

如果必须匹配ID,则awk会稍微复杂一些。


这是一个稍微复杂一点的版本,与第一个文件相比,第二个文件中查询结果的ID无序:

awk 'NR==FNR {if($0 ~ /^---/) {b=1} else if(b==1) {a[$1$2]=$3; b=0} next} $1$2 in a {if(a[$1$2]<$3) print $2}' file1 file2

主要区别在于a中的密钥基于ID中每个查询的第一个结果的file1PO_NUM。对于第二个文件,将检查结果中的所有条目,以查找数组中匹配的IDPO_NUM组合。此外,由于b现在仅在第一个文件中有用,我稍微重新安排了条件测试。