我正在尝试比较4个文本文件中每一行的计数:
file1.txt:
32
44
75
22
88
file2.txt
32
44
75
22
88
file3.txt
11
44
75
22
77
file4.txt
32
44
75
22
88
每行代表一个标题
line1 = customerID count
line2 = employeeID count
line3 = active_users
line4 = inactive_users
line5 = deleted_users
我正在尝试将 file2.txt , file3.txt 和 file4.txt 与 file1.txt ; file1.txt 将始终具有正确的计数。
示例:由于在上面的示例中 file2.txt 与 file1.txt 逐行精确匹配,因此我尝试输出“ file2.txt很好” ,但是由于 file3.txt 第1行和第5行与 file1.txt 不匹配,因此我试图输出“ file3的客户ID。 txt与21条记录不匹配” ,(即 32-11 = 21 )和file3.txt中的“ deleted_users不匹配11条记录” ,( 88-77 = 11 )。
如果shell更容易,那也很好。
答案 0 :(得分:3)
一种并行处理文件的方法
deviceIP = raw_input("Enter the IP address for the device")
这将{em>整行与use warnings;
use strict;
use feature 'say';
my @files = @ARGV;
#my @files = map { $_ . '.txt' } qw(f1 f2 f3 f4); # my test files' names
# Open all files, filehandles in @fhs
my @fhs = map { open my $fh, '<', $_ or die "Can't open $_: $!"; $fh } @files;
# For reporting, enumerate file names
my %files = map { $_ => $files[$_] } 0..$#files;
# Process (compare) the same line from all files
my $line_cnt;
LINE: while ( my @line = map { my $line = <$_>; $line } @fhs )
{
defined || last LINE for @line;
++$line_cnt;
s/(?:^\s+|\s+$)//g for @line;
for my $i (1..$#line) {
if ($line[0] != $line[$i]) {
say "File $files[$i] differs at line $line_cnt";
}
}
}
进行比较(除去前导和尾随空格之后),因为我们得出的结论是每行都带有一个需要比较的数字。
它以我的测试文件==
,f1.txt
,...打印出来...
File f3.txt differs at line 1 File f3.txt differs at line 5
答案 1 :(得分:1)
将行名存储在一个数组中,将正确的值存储在另一个数组中。然后,循环遍历文件,对于每个文件,读取它们的行并将它们与存储的正确值进行比较。您可以使用包含最后访问文件句柄的行号的特殊变量$.
作为数组的索引。行从1开始,数组从0开始,因此我们需要减去1以获得正确的索引。
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my @line_names = ('customerID count',
'employeeID count',
'active_users',
'inactive_users',
'deleted_users');
my @correct;
open my $in, '<', shift or die $!;
while (<$in>) {
chomp;
push @correct, $_;
}
while (my $file = shift) {
open my $in, '<', $file or die $!;
while (<$in>) {
chomp;
if ($_ != $correct[$. - 1]) {
say "$line_names[$. - 1] in $file does not match by ",
$correct[$. - 1] - $_, ' records';
}
}
}
答案 2 :(得分:1)
将第一个文件读入数组,然后使用相同的功能遍历其他文件以读入数组。在此循环中,请考虑每一行,计算diff,如果diff不为零,则使用@names中的文本显示消息。
#!/usr/bin/perl
use strict;
use warnings;
my @names = qw(customerID_count employeeID_count active_users inactive_users deleted_users);
my @files = qw(file1.txt file2.txt file3.txt file4.txt);
my @first = readfile($files[0]);
for (my $i = 1; $i <= $#files; $i++) {
print "\n$files[0] <=> $files[$i]:\n";
my @second = readfile($files[$i]);
for (my $j = 0; $j <= $#names; $j++) {
my $diff = $first[$j] - $second[$j];
$diff = -$diff if $diff < 0;
if ($diff > 0) {
print "$names[$j] does not match by $diff records\n";
}
}
}
sub readfile {
my ($file) = @_;
open my $handle, '<', $file;
chomp(my @lines = <$handle>);
close $handle;
return grep(s/\s*//g, @lines);
}
输出为:
file1.txt <=> file2.txt:
file1.txt <=> file3.txt:
customerID_count does not match by 21 records
deleted_users does not match by 11 records
file1.txt <=> file4.txt:
答案 3 :(得分:1)
bash
的混搭,主要是{util> GNU 版本的标准实用程序,例如diff
,sdiff
,sed
,等,加上ifne
util,甚至还有eval
:
f=("" "customerID count" "employeeID count" \
"active_users" "inactive_users" "deleted_users")
for n in file{2..4}.txt ; do
diff -qws file1.txt $n ||
$(sdiff file1 $n | ifne -n exit | nl |
sed -n '/|/{s/[1-5]/${f[&]}/;s/\s*|\s*/-/;s/\([0-9-]*\)$/$((&))/;p}' |
xargs printf 'eval echo "%s for '"$n"' does not match by %s records.";\n') ;
done
输出:
Files file1.txt and file2.txt are identical
Files file1.txt and file3.txt differ
customerID count for file3.txt does not match by 21 records.
deleted_users for file3.txt does not match by 11 records.
Files file1.txt and file4.txt are identical
为更漂亮的输出而调整的相同代码:
f=("" "customerID count" "employeeID count" \
"active_users" "inactive_users" "deleted_users")
for n in file{2..4}.txt ; do
diff -qws file1.txt $n ||
$(sdiff file1 $n | ifne -n exit | nl |
sed -n '/|/{s/[1-5]/${f[&]}/;s/\s*|\s*/-/;s/\([0-9-]*\)$/$((&))/;p}' |
xargs printf 'eval echo "%s does not match by %s records.";\n') ;
done |
sed '/^Files/!s/^/\t/;/^Files/{s/.* and //;s/ are .*/ is good/;s/ differ$/:/}'
输出:
file2.txt is good
file3.txt:
customerID count does not match by 21 records.
deleted_users does not match by 11 records.
file4.txt is good
答案 4 :(得分:0)
以下是Perl中的示例:
use feature qw(say);
use strict;
use warnings;
{
my $ref = read_file('file1.txt');
my $N = 3;
my @value_info;
for my $i (1..$N) {
my $fn = 'file'.($i+1).'.txt';
my $values = read_file( $fn );
push @value_info, [ $fn, $values];
}
my @labels = qw(customerID employeeID active_users inactive_users deleted_users);
for my $info (@value_info) {
my ( $fn, $values ) = @$info;
my $all_ok = 1;
my $j = 0;
for my $value (@$values) {
if ( $value != $ref->[$j] ) {
printf "%s: %s does not match by %d records\n",
$fn, $labels[$j], abs( $value - $ref->[$j] );
$all_ok = 0;
}
$j++;
}
say "$fn: is good" if $all_ok;
}
}
sub read_file {
my ( $fn ) = @_;
my @values;
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
while( my $line = <$fh>) {
if ( $line =~ /(\d+)/) {
push @values, $1;
}
}
close $fh;
return \@values;
}
输出:
file2.txt: is good
file3.txt: customerID does not match by 21 records
file3.txt: deleted_users does not match by 11 records
file4.txt: is good