我有多个CSV文件,想要创建包含唯一条目的主文件,只需要输入的位置。我无法弄清楚要创建列的内容。
档案1
fragment
accb
bbc
ccd
文件2
fragment
ccd
llk
kks
输出
fragment file 1 file 2
accb 1 0
bbc 1 1
ccd 1 1
llk 0 1
kks 0 1
use strict;
use warnings;
use feature qw(say);
use autodie;
use constant {
FILE_1 => "file1.csv",
FILE_2 => "file2.csv",
};
my %hash;
#
# Load the Hash with value from File #1
#
open my $file1_fh, "<", FILE_1;
while ( my $value = <$file1_fh> ) {
chomp $value;
$hash{$value} = 1;
}
close $file1_fh;
#
# Add File #2 to the Hash
#
open my $file2_fh, "<", FILE_2;
while ( my $value = <$file2_fh> ) {
chomp $value;
$hash{$value} = 1; #If that value was in "File #1", it will be "replaced"
}
close $file2_fh;
for my $value ( sort keys %hash ) {
say $value;
}
答案 0 :(得分:0)
在这种情况下,将信息编码到哈希值中是一种很好的方法。这样做的一种方法如下:
my %hash;
#
# Load the Hash with value from File #1
#
open my $file1_fh, "<", FILE_1;
while ( my $value = <$file1_fh> ) {
chomp $value;
$hash{$value}++;
}
close $file1_fh;
#
# Add File #2 to the Hash
#
open my $file2_fh, "<", FILE_2;
while ( my $value = <$file2_fh> ) {
chomp $value;
$hash{$value} += 10; # if the key already exists, the value will now be 11
# if it did not exist, the value will be 10
}
close $file2_fh;
for my $k ( sort keys %hash )
{ if ($hash{$k} == 1) { # only in file 1
say "$k\t0\t1";
}
elsif ($hash{$k} == 10) { # only in file 2
say "$k\t1\t0";
}
else { # in both file 1 and file 2
say "$k\t1\t1";
}
}
您可以使用100,1000,10000等扩展此方法以用于多个文件
另一种可能性是建立一个更复杂的数据结构,记录存在记录的文件的名称,例如。
for my $file (@array_of_files) {
open my $f, "<", $file or die "Could not open $f: $!";
while (my $l = <$f>) {
chomp($l);
$hash{$l}{$file}++; # store the file name
}
}
如果您有大量文件或想要更具描述性/可理解的哈希数据,这将非常有用。