比较两个文本文件并打印出与id相关的匹配id,sub_ids和timestamp

时间:2015-05-20 17:51:23

标签: perl

我试图比较两个txt表文件文件1.txt(查找表)和file2.txt(主文件)并打印匹配的ID和主文件中的另外两列(Rev_Id和Date Released)

文件1:查找表

Name                IRR ID
slic73p1hsicbxttop  99034438
c73p1avrsrldo150top 99034238
c73p1avrfusevrmtop  99034201 

文件2:masterfile

Type Name               Rev ID   IRR ID   PP Group      Date Released   PP Category                              
Comp c73p1avrfusevrmtop PROD_2_5 99034201 SEG RIP Reuse 5/3/2015 6:59   Hard   
Comp c73p1avrfusevrmtop PROD_2_4 99034201 SEG RIP Reuse 4/23/2015 10:27 Hard   
Comp c73p1avrfusevrmtop PROD_2_3 99034201 SEG RIP Reuse 3/17/2015 23:51 Hard   
Comp c73p1avrfusevrmtop PROD_2_2 99034201 SEG RIP Reuse 2/1/2015 11:27  Hard

预期输出:表格中还有其他行不匹配

IRR ID   Rev ID   Date Released (date to be printed in a chronological order)
99034201 PROD_2_5 5/3/2015 6:59    
99034201 PROD_2_4 4/23/2015 10:27  
99034201 PROD_2_3 3/17/2015 23:51  
99034201 PROD_2_2 2/1/2015 11:27  

我让我的代码最终工作:这是我的代码,但它正在根据我的要求做不同的工作。我的要求是从lookup.txt获取IRR ID并将其与c73p1avrfusevrmtop.txt(主表)匹配,并仅打印匹配的IRR ID,Rev ID(与该IRR ID相关)和Rev ID发布的日期。但我的程序只是打印所有IRR ID,RevID和Date Released而不将它们与查找表匹配。我不确定我的程序中的bug在哪里。这是我的计划:

#!/bin/env perl
#use warnings;
 use strict; 
 use autodie; 
 use Data::Dumper;

 my $lookup_qfn = 'lookup.txt';
 my $master_qfn = 'c73p1avrfusevrmtop.txt';

 my %ids_to_lookup;
    {
    open(my $fh, '<:encoding(UTF-8)', $lookup_qfn);
    <$fh>;  # Header
    while (<$fh>) {
       my @fields = split();
       #print Dumper(@fields);
       ++$ids_to_lookup{$fields[0]}{$fields[1]};
    }

}
my @output;
 {
    open(my $fh, '<', $master_qfn);
    <$fh>;  # Header
     print(join("    ", "IRR ID", "Rev ID", "Date Released"), "\n");
 } 

 {
    open(my $fh, '<:encoding(UTF-8)', $master_qfn);
    <$fh>;  # Header
    while (<$fh>) {
       my @fields = split();
       #print Dumper(@fields);
       $ids_to_lookup{$fields[1]}{$fields[3]};

        print(join("  ", @fields[3,2,7,8]), "\n");
    }
}

This is my output:
     IRR ID    Rev ID    Date Released
     99034201  PROD_2_5  2015-05-03  6:59:09
     99034201  PROD_2_4  2015-04-23  10:27:38
     99034201  PROD_2_3  2015-03-17  23:51:23
     99034201  PROD_2_2  2015-02-01  11:27:55
     99034201  PROD_2_1  2014-12-26  6:43:14
     99034201  PROD_2_0  2014-12-20  21:09:06
     **99038319  PROD_1_7  2014-12-17  21:38:19
     99038319  PROD_1_6  2014-12-04  6:24:26
     99038319  PROD_1_5  2014-11-17  8:51:49**
     99034201  POLO_2_0  2014-10-30  23:01:49
     99034201  PROD_1_3  2014-06-16  6:58:50
     99034201  PROD_1_2  2014-05-10  2:37:42
     99034201  PROD_1_1  2014-04-27  22:58:48
     99034201  PROD_1_0  2014-01-17  10:15:02
     99034201  POLO_1_1  2014-01-07  11:18:45
     99034201  POLO_1_0  2013-10-20  18:23:11
     99034201  RTL1P0_1_0  2013-06-26  11:33:03

我知道我很烦你们。但我只是想通过所有这些努力来学习Perl。

3 个答案:

答案 0 :(得分:2)

这对你有用。我认为ikegami忽略了对输出进行排序的要求,因为他的解决方案不包含任何代码

我假设Irr-ID足以识别记录,并且不能有多个具有相同ID和不同名称的条目

此外,正如您所说,这会按时间顺序对输出进行排序,即使您请求的输出是 reverse 时间顺序

#!/bin/env perl

use strict; 
use warnings;
use 5.010;
use autodie;

use open qw/ :std :encoding(UTF-8) /;

use Time::Piece;

my ($filename1, $filename2) = qw/ lookup.txt c73p1avfusevrmtop.txt /;

my %ids_required;

open my $fh, '<', $filename1;
while ( <$fh> ) {
  my $irr_id = (split)[1];
  ++$ids_required{$irr_id};
}

my @rows;

open $fh, '<', $filename2;
while ( <$fh> ) {

  my ($rev_id, $irr_id) = (split)[2,3];
  next unless $ids_required{$irr_id};

  my $date_time = join ' ', m< (\d{1,2}/\d{1,2}/\d{4}) \s+ (\d{1,2}:\d{2}) >x;
  $date_time = Time::Piece->strptime($date_time, '%m/%d/%Y %H:%M');
  push @rows, [ $irr_id, $rev_id, $date_time ];
}

print join("\t", 'IRR ID', 'Rev ID', 'Date Released'), "\n";
print join("\t", @$_), "\n" for sort { $a->[2] <=> $b->[2] } @rows;

<强>输出

IRR ID  Rev ID  Date Released
99034201    PROD_2_2    2/1/2015 11:27
99034201    PROD_2_3    3/17/2015 23:51
99034201    PROD_2_4    4/23/2015 10:27
99034201    PROD_2_5    5/3/2015 6:59

答案 1 :(得分:1)

#!/bin/env perl
use warnings;
use strict; 
use autodie;
use open qw/ :std :encoding(UTF-8) /;

my $lookup_qfn = 'lookup.txt';
my $master_qfn = 'c73p1avfusevrmtop.txt';

my %ids_to_lookup;
{
    open(my $fh, '<', $lookup_qfn);
    <$fh>;  # Header
    while (<$fh>) {
        chomp;
        my @fields = split /\t/;
        ++$ids_to_lookup{$fields[0]}{$fields[1]};
    }
}

my @output;
{
    open(my $fh, '<', $master_qfn);
    <$fh>;  # Header
    print(join("\t", "IRR ID", "Rev ID", "Date Released"), "\n");
    while (<$fh>) {
        chomp;
        my @fields = split /\t/;
        next if !$ids_to_lookup{$fields[1]}{$fields[3]};

        print(join("\t", @fields[3,2,5]), "\n");
    }
}

答案 2 :(得分:0)

不确定如何正确地对日期进行排序,但除此之外,请执行此操作:

use warnings;
use strict; 


open my $file1, '<', 'in1.txt' or die $!;
open my $file2, '<', 'in2.txt' or die $!;

my %data;
while(<$file1>){
    chomp;
    next if /^Name/;
    my @split = split(/\s+/);
    $data{$split[0]} = $split[1];
}

my %match;
while(<$file2>){
    chomp;
    next if /^Type/;
    my @split = split;
    my ($name, $rev_id, $irr_id, $date, $time) = ($split[1], $split[2], $split[3], $split[7], $split[8]);

    push @{$match{$name}{$date}}, $rev_id, $irr_id, $time;
}

foreach my $name (keys %match){
    foreach my $date (sort {  $match{$name}{$a} cmp $match{$name}{$b}} keys $match{$name}){
        my ($rev_id, $irr_id, $time) = @{$match{$name}{$date}};
        print "$irr_id\t$rev_id\t$date\t$time\n";

     }
}