我基本上想在两个文本文件(CSV样式)之间进行无序差异,我比较前两列中的字段(我不关心第3列值)。然后我打印出file1.txt具有但不存在于file2.txt中的值,反之亦然,file2.txt与file1.txt相比。
FILE1.TXT:
cat,val 1,43432
cat,val 2,4342
dog,value,23
cat2,value,2222
hedgehog,input,233
FILE2.TXT:
cat2,value,312
cat,val 2,11
cat,val 3,22
dog,value,23
hedgehog,input,2145
bird,output,9999
输出将是这样的:
file1.txt:
cat,val 1,43432
file2.txt:
cat,val 3,22
bird,output,9999
我是Perl的新手,所以目前我所知道的一些更好,更难看的方法。谢谢你的帮助。
当前代码:
#!/usr/bin/perl -w
use Cwd;
use strict;
use Data::Dumper;
use Getopt::Long;
my $myName = 'MyDiff.pl';
my $usage = "$myName is blah blah blah";
#retreive the command line options, set up the environment
use vars qw($file1 $file2);
#grab the specified values or exit program
GetOptions("file1=s" => \$file1,
"file2=s" => \$file2)
or die $usage;
( $file1 and $file2 ) or die $usage;
open (FH, "< $file1") or die "Can't open $file1 for read: $!";
my @array1 = <FH>;
close FH or die "Cannot close $file1: $!";
open (FH, "< $file2") or die "Can't open $file2 for read: $!";
my @array2 = <FH>;
close FH or die "Cannot close $file2: $!";
#...do a sort and match
答案 0 :(得分:4)
使用哈希作为密钥,前两列为密钥。 一旦你有这两个哈希,你可以迭代并删除常用条目, 各个哈希中剩下的东西将是你要找的东西。
初始化,
my %hash1 = ();
my %hash2 = ();
读入第一个文件,连接前两列以形成密钥并将其保存为哈希。假设字段以逗号分隔。您也可以使用CSV模块。
open( my $fh1, "<", $file1 ) || die "Can't open $file1: $!";
while(my $line = <$fh1>) {
chomp $line;
# join first two columns for key
my $key = join ",", (split ",", $line)[0,1];
# create hash entry for file1
$hash1{$key} = $line;
}
对file2执行相同操作并创建%hash2
open( my $fh2, "<", $file2 ) || die "Can't open $file2: $!";
while(my $line = <$fh2>) {
chomp $line;
# join first two columns for key
my $key = join ",", (split ",", $line)[0,1];
# create hash entry for file2
$hash2{$key} = $line;
}
现在查看条目并删除常用条目,
foreach my $key (keys %hash1) {
if (exists $hash2{$key}) {
# common entry, delete from both hashes
delete $hash1{$key};
delete $hash2{$key};
}
}
%hash1现在将包含仅在file1中的行。
你可以打印出来,
foreach my $key (keys %hash1) {
print "$hash1{$key}\n";
}
foreach my $key (keys %hash2) {
print "$hash2{$key}\n";
}
答案 1 :(得分:2)
也许以下内容会有所帮助:
use strict;
use warnings;
my @files = @ARGV;
pop;
my %file1 = map { chomp; /(.+),/; $1 => $_ } <>;
push @ARGV, $files[1];
my %file2 = map { chomp; /(.+),/; $1 => $_ } <>;
print "$files[0]:\n";
print $file1{$_}, "\n" for grep !exists $file2{$_}, keys %file1;
print "\n$files[1]:\n";
print $file2{$_}, "\n" for grep !exists $file1{$_}, keys %file2;
用法:perl script.pl file1.txt file2.txt
数据集输出:
file1.txt:
cat,val 1,43432
file2.txt:
cat,val 3,22
bird,output,9999
这会为每个文件构建一个哈希值。键是前两列,关联值是实线。 grep
用于过滤共享密钥。
编辑:在相对较小的文件上,使用上述map
来处理文件的行将正常工作。但是,首先创建所有文件行的列表,然后传递给map
。在较大的文件上,最好使用while (<>) { ...
构造,一次读取一行。下面的代码执行此操作 - 生成与上面相同的输出 - 并使用哈希散列(HoH)。因为它使用HoH,所以你会注意到一些解除引用:
use strict;
use warnings;
my %hash;
my @files = @ARGV;
while (<>) {
chomp;
$hash{$ARGV}{$1} = $_ if /(.+),/;
}
print "$files[0]:\n";
print $hash{ $files[0] }{$_}, "\n"
for grep !exists $hash{ $files[1] }{$_}, keys %{ $hash{ $files[0] } };
print "\n$files[1]:\n";
print $hash{ $files[1] }{$_}, "\n"
for grep !exists $hash{ $files[0] }{$_}, keys %{ $hash{ $files[1] } };
答案 2 :(得分:0)
我认为上述问题可以通过上述算法中的任何一个来解决
a)我们可以使用上面提到的哈希
b)中 1.使用Key1和Key2对文件进行排序(使用排序乐趣)
通过FILE1
迭代Match the key1 and key2 entry of FILE1 with FILE2 If yes then take action by printing common lines it to desired file as required Move to next row in File1 (continue with the loop ) If No then Iterate through File2 startign from the POS-FILE2 until match is found Match the key1 and key2 entry of FILE1 with FILE2 If yes then take action by printing common lines it to desired file as required setting FILE2-END as true exit from the loop noting the position of FILE2 If no then take action by printing unmatched lines to desired file as req. Move to next row in File2 If FILE2-END is true Rest of Lines in FILE1 doesnt exist in FILE2