下面是使用regex搜索逗号分隔文件的示例。有谁知道如何将以下代码转换为哈希地图搜索。如果匹配,代码应返回两个文件中的原始行。
您不必必须使用哈希映射。您的解决方案可以包括用于搜索数组的任何其他更快的方法,例如grep,hash,smart search,first等。
这些文件中有数千条记录。目标是在file2.csv的第3列和file2.csv的第4列中找到类似的项目。如果有匹配则加入两个文档中的行。
更新:忘记提及如果它与@ data2数组中的任何内容都不匹配则应该打印$ line1
my $data_file1 = "file1.csv"; #contains in this file "james,smith,3 kids"
my $data_file2 = "file2.csv"; #contains in this file "jim,jones,tall,3 kids"
my $handle1;
my @temp_data1, @temp_data2;
open $handle1, '<', $data_file1;
chomp(@data1 = <$handle1>);
close $handle1;
open $handle1, '<', $data_file2;
chomp(@data2 = <$handle1>);
close $handle1;
foreach my $line1 (@data1)
{
@temp_data1 = split /,/ , $line1;
$not_found =1;
foreach my $line2 (@data2)
{
@temp_data2 = split /,/ , $line2;
if($temp_data2[3] =~ /$temp_data1[2]$/)
{
$not_found =0;
say $line1 .",". $line2;
}
}
if($not_found)
{
say "$line1 was not found";
}
}
答案 0 :(得分:2)
使用键字段作为哈希键并将行作为值来填充哈希。然后浏览另一个文件,在哈希中查找匹配项。
use Text::CSV_XS qw( );
@ARGV == 2
or die("usage\n");
my ($data_file1, $data_file2) = @ARGV;
open(my $fh1, '<', $data_file1);
or die("Can't open \"$data_file1\": $!\n");
open(my $fh2, '<', $data_file2);
or die("Can't open \"$data_file2\": $!\n");
my $csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
my %data;
while ( my $row = $csv->getline($fh2) ) {
$data{ $row->[3] } = $row;
}
while ( my $row = $csv->getline($fh1) ) {
if ( my $linked_row = $data{ $row->[2] } ) {
$csv->say(\*STDOUT, [ @$row, @$linked_row ]);
} else {
$csv->say(\*STDERR, $row);
}
}
用法:
script file1.csv file2.csv >merged.csv 2>unpaired.csv
CPU:O(N + M)而不是O(N * M)
记忆:O(M)而不是O(N + M)
其中N是第一个文件中元素的数量,
M是第二个文件中的元素数。
答案 1 :(得分:0)
my $data_file1 = "file1.csv"; #contains in this file "james,smith,3 kids"
my $data_file2 = "file2.csv"; #contains in this file "jim,jones,tall,3 kids"
my $handle1;
my %searchHash;
open $handle1, '<', $data_file1;
while (my $line = <$handle1>) {
chomp($line);
$searchHash{(split /,/,$line)[2]} = 0;
}
close $handle1;
open $handle1, '<', $data_file2;
while (my $line = <$handle1>) {
chomp($line);
my $key = (split /,/,$line)[3];
$searchHash{$key}++ if(defined $searchHash{$key});
}
close $handle1;
foreach my $key (keys %searchHash) {
print "$key ($searchHash{$key})\n";
}