Question

下面是使用regex搜索逗号分隔文件的示例。有谁知道如何将以下代码转换为哈希地图搜索。如果匹配，代码应返回两个文件中的原始行。

您不必必须使用哈希映射。您的解决方案可以包括用于搜索数组的任何其他更快的方法，例如grep，hash，smart search，first等。

这些文件中有数千条记录。目标是在file2.csv的第3列和file2.csv的第4列中找到类似的项目。如果有匹配则加入两个文档中的行。

更新：忘记提及如果它与@ data2数组中的任何内容都不匹配则应该打印$ line1

   my $data_file1 = "file1.csv";  #contains in this file "james,smith,3 kids"
my $data_file2 = "file2.csv";  #contains in this file "jim,jones,tall,3 kids"

my $handle1;
my @temp_data1, @temp_data2;

open $handle1, '<', $data_file1;
chomp(@data1 = <$handle1>);
close $handle1;     

open  $handle1, '<', $data_file2;
chomp(@data2 = <$handle1>);
close $handle1; 

foreach my $line1 (@data1)
{   
    @temp_data1 = split /,/ , $line1;   
    $not_found =1;
    foreach my $line2 (@data2)
    {           
        @temp_data2 = split /,/ , $line2;   

        if($temp_data2[3] =~ /$temp_data1[2]$/)
        {
            $not_found =0;
            say $line1 .",". $line2;
        }
    }
    if($not_found)
    {
        say "$line1 was not found";
    }
}

Answer 1

使用键字段作为哈希键并将行作为值来填充哈希。然后浏览另一个文件，在哈希中查找匹配项。

use Text::CSV_XS qw( );

@ARGV == 2
   or die("usage\n");

my ($data_file1, $data_file2) = @ARGV;

open(my $fh1, '<', $data_file1);
   or die("Can't open \"$data_file1\": $!\n");
open(my $fh2, '<', $data_file2);
   or die("Can't open \"$data_file2\": $!\n");

my $csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });

my %data;
while ( my $row = $csv->getline($fh2) ) {
   $data{ $row->[3] } = $row;
}

while ( my $row = $csv->getline($fh1) ) {
   if ( my $linked_row = $data{ $row->[2] } ) {
      $csv->say(\*STDOUT, [ @$row, @$linked_row ]);
   } else {
      $csv->say(\*STDERR, $row);
   }
}

用法：

script file1.csv file2.csv >merged.csv 2>unpaired.csv

假设第一个文件的第3列仅包含唯一值。
假设第二个文件的第4列仅包含唯一值。

CPU：O（N + M）而不是O（N * M）记忆：O（M）而不是O（N + M）其中N是第一个文件中元素的数量，
M是第二个文件中的元素数。

Answer 2

my $data_file1 = "file1.csv";  #contains in this file "james,smith,3 kids"
my $data_file2 = "file2.csv";  #contains in this file "jim,jones,tall,3 kids"

my $handle1;

my %searchHash;

open $handle1, '<', $data_file1;
while (my $line = <$handle1>) {
    chomp($line);
    $searchHash{(split /,/,$line)[2]} = 0;
}
close $handle1;     

open  $handle1, '<', $data_file2;
while (my $line = <$handle1>) {
    chomp($line);
    my $key = (split /,/,$line)[3];
    $searchHash{$key}++ if(defined $searchHash{$key});
}
close $handle1;

foreach my $key (keys %searchHash) {
    print "$key ($searchHash{$key})\n";
}

使用哈希映射和搜索逗号分隔文件

2 个答案: