逐行比较4个文件,以查看它们是否匹配

时间:2019-06-21 20:09:15

标签: bash shell file perl

我正在尝试比较4个文本文件中每一行的计数:

file1.txt:
32
44
75
22
88

file2.txt
32
44
75
22
88

file3.txt
11
44
75
22
77

file4.txt
    32
    44
    75
    22
    88

每行代表一个标题

line1 = customerID count
line2 = employeeID count
line3 = active_users
line4 = inactive_users
line5 = deleted_users

我正在尝试将 file2.txt file3.txt file4.txt file1.txt ; file1.txt 将始终具有正确的计数。

示例:由于在上面的示例中 file2.txt file1.txt 逐行精确匹配,因此我尝试输出“ file2.txt很好” ,但是由于 file3.txt 第1行和第5行与 file1.txt 不匹配,因此我试图输出“ file3的客户ID。 txt与21条记录不匹配” ,( 32-11 = 21 )和file3.txt中的“ deleted_users不匹配11条记录” ,( 88-77 = 11 )。

如果shell更容易,那也很好。

5 个答案:

答案 0 :(得分:3)

一种并行处理文件的方法

deviceIP = raw_input("Enter the IP address for the device")

这将{em>整行与use warnings; use strict; use feature 'say'; my @files = @ARGV; #my @files = map { $_ . '.txt' } qw(f1 f2 f3 f4); # my test files' names # Open all files, filehandles in @fhs my @fhs = map { open my $fh, '<', $_ or die "Can't open $_: $!"; $fh } @files; # For reporting, enumerate file names my %files = map { $_ => $files[$_] } 0..$#files; # Process (compare) the same line from all files my $line_cnt; LINE: while ( my @line = map { my $line = <$_>; $line } @fhs ) { defined || last LINE for @line; ++$line_cnt; s/(?:^\s+|\s+$)//g for @line; for my $i (1..$#line) { if ($line[0] != $line[$i]) { say "File $files[$i] differs at line $line_cnt"; } } } 进行比较(除去前导和尾随空格之后),因为我们得出的结论是每行都带有一个需要比较的数字。

它以我的测试文件==f1.txt,...打印出来...

File f3.txt differs at line 1
File f3.txt differs at line 5

答案 1 :(得分:1)

将行名存储在一个数组中,将正确的值存储在另一个数组中。然后,循环遍历文件,对于每个文件,读取它们的行并将它们与存储的正确值进行比较。您可以使用包含最后访问文件句柄的行号的特殊变量$.作为数组的索引。行从1开始,数组从0开始,因此我们需要减去1以获得正确的索引。

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

my @line_names = ('customerID count',
                  'employeeID count',
                  'active_users',
                  'inactive_users',
                  'deleted_users');

my @correct;
open my $in, '<', shift or die $!;
while (<$in>) {
    chomp;
    push @correct, $_;
}

while (my $file = shift) {
    open my $in, '<', $file or die $!;
    while (<$in>) {
        chomp;
        if ($_ != $correct[$. - 1]) {
            say "$line_names[$. - 1] in $file does not match by ",
                $correct[$. - 1] - $_, ' records';
        }
    }
}

答案 2 :(得分:1)

将第一个文件读入数组,然后使用相同的功能遍历其他文件以读入数组。在此循环中,请考虑每一行,计算diff,如果diff不为零,则使用@names中的文本显示消息。

#!/usr/bin/perl

use strict;
use warnings;

my @names = qw(customerID_count employeeID_count active_users inactive_users deleted_users);
my @files = qw(file1.txt file2.txt file3.txt file4.txt);

my @first = readfile($files[0]);

for (my $i = 1; $i <= $#files; $i++) {
    print "\n$files[0] <=> $files[$i]:\n";
    my @second = readfile($files[$i]);
    for (my $j = 0; $j <= $#names; $j++) {
        my $diff = $first[$j] - $second[$j];
        $diff = -$diff if $diff < 0;
        if ($diff > 0) {
            print "$names[$j] does not match by $diff records\n";
        }
    }
}

sub readfile {
    my ($file) = @_;
    open my $handle, '<', $file;
    chomp(my @lines = <$handle>);
    close $handle;
    return grep(s/\s*//g, @lines);
}

输出为:

file1.txt <=> file2.txt:

file1.txt <=> file3.txt:
customerID_count does not match by 21 records
deleted_users does not match by 11 records

file1.txt <=> file4.txt:

答案 3 :(得分:1)

bash的混搭,主要是{util> GNU 版本的标准实用程序,例如diffsdiffsed,加上ifne util,甚至还有eval

f=("" "customerID count" "employeeID count" \
   "active_users" "inactive_users" "deleted_users")
for n in file{2..4}.txt ; do 
    diff -qws file1.txt $n || 
    $(sdiff file1 $n | ifne -n exit | nl | 
      sed -n '/|/{s/[1-5]/${f[&]}/;s/\s*|\s*/-/;s/\([0-9-]*\)$/$((&))/;p}' | 
      xargs printf 'eval echo "%s for '"$n"' does not match by %s records.";\n') ; 
done

输出:

Files file1.txt and file2.txt are identical
Files file1.txt and file3.txt differ
customerID count for file3.txt does not match by 21 records.
deleted_users for file3.txt does not match by 11 records.
Files file1.txt and file4.txt are identical

为更漂亮的输出而调整的相同代码:

f=("" "customerID count" "employeeID count" \
   "active_users" "inactive_users" "deleted_users")
for n in file{2..4}.txt ; do 
    diff -qws file1.txt $n || 
    $(sdiff file1 $n | ifne -n exit | nl | 
      sed -n '/|/{s/[1-5]/${f[&]}/;s/\s*|\s*/-/;s/\([0-9-]*\)$/$((&))/;p}' | 
      xargs printf 'eval echo "%s does not match by %s records.";\n') ; 
done  | 
sed '/^Files/!s/^/\t/;/^Files/{s/.* and //;s/ are .*/ is good/;s/ differ$/:/}'

输出:

file2.txt is good
file3.txt:
    customerID count does not match by 21 records.
    deleted_users does not match by 11 records.
file4.txt is good

答案 4 :(得分:0)

以下是Perl中的示例:

use feature qw(say);
use strict;
use warnings;

{
    my $ref = read_file('file1.txt');
    my $N = 3;
    my @value_info;
    for my $i (1..$N) {
        my $fn = 'file'.($i+1).'.txt';
        my $values = read_file( $fn );
        push @value_info, [ $fn, $values];
    }
    my @labels = qw(customerID employeeID active_users inactive_users deleted_users);
    for my $info (@value_info) {
        my ( $fn, $values ) = @$info;
        my $all_ok = 1;
        my $j = 0;
        for my $value (@$values) {
            if ( $value != $ref->[$j] ) {
                printf "%s: %s does not match by %d records\n",
                  $fn, $labels[$j], abs( $value - $ref->[$j] );
                $all_ok = 0;
            }
            $j++;
        }
        say "$fn: is good" if $all_ok;
    }
}

sub read_file {
    my ( $fn ) = @_;

    my @values;
    open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
    while( my $line = <$fh>) {
        if ( $line =~ /(\d+)/) {
            push @values, $1;
        }
    }
    close $fh;
    return \@values;
}

输出

file2.txt: is good
file3.txt: customerID does not match by 21 records
file3.txt: deleted_users does not match by 11 records
file4.txt: is good