Question

我想编写一个Perl脚本：

定期监视输入CSV文件的文件目录
在检测到文件时，打开，读取和合并第二个字段/列具有相同值的多行
将更新后的CSV文件写入新目录，最后
删除输入文件。

例如，我有一个包含以下信息的CSV文件：

"101","5555555555","DOE, JOHN "," DOE, JOHN, your trip
tomorrow from, 123 Anywhere St Apt #A, to, 100 ELSEWHERE RD APT E, is
scheduled for pickup between, 1:00 PM, and 1:30 PM"

"102","5555555555","DOE, JOHN "," DOE, JOHN, your trip
tomorrow from, 100 ELSEWHERE RD APT E, to, 123 Anywhere St Apt #A, is
scheduled for pickup between, 9:00 PM, and 9:30 PM"

我希望脚本能够读取，解析和检测第二个字段（“5555555555”）的重复值，然后创建一个新的CSV文件，将上述记录合并为一个记录：

"101","5555555555","DOE, JOHN "," DOE, JOHN, your trip
tomorrow from, 123 Anywhere St Apt #A, to, 100 ELSEWHERE RD APT E, is
scheduled for pickup between, 1:00 PM, and 1:30 PM AND your trip
tomorrow from, 100 ELSEWHERE RD APT E, to, 123 Anywhere St Apt #A, is
scheduled for pickup between, 9:00 PM, and 9:30 PM"

我当前的Perl代码成功检测，读取和解析文件，但是，我在如何检测重复项并组合行时迷失了方向。

#!
use strict;
use warnings;
use File::Find;
use Text::CSV;

$| = 1;

use constant {
    #Check for CSV files only
    SUFFIX_LIST => qr/\.(csv)$/,
    DIR_TO_CHECK => "/Users/Me/Desktop/INBOUND/",
};

my @file_list;

while (1) {

    #Recursively search the input directory for CSV files
    find ( sub {
            return unless -f;
            return unless $_ =~ SUFFIX_LIST;

                #Make sure all of the files in the file list array are unique
                if(!(grep(/^$_$/, @file_list))) {
                    push @file_list, $File::Find::name;
                }
           }, DIR_TO_CHECK 
    );

#If .csv files are found...
if (scalar(@file_list) > 0) {
    print "\nNew Item in Directory\n";

    parseFile($file_list[0]);

    #Delete input file
    unlink $file_list[0];

    print "Deleted File\n";

    #Remove the file from the file list
    shift @file_list;
} else {

    print "No New Item\n";

}

sleep 5;
}

#Subroutine to parse and compare the csv file
sub parseFile() {

my $csv = Text::CSV->new({ sep_char     => ',',
                       always_quote => 1,
                       quote_char   => '"',
                       escape_char  => '"',
                       binary       => 1,
                       auto_diag    => 1});

#Get the file that was passed to the function
my $file = $_[0] or die "CSV file not passed in subroutine\n";

#Open file for reading
open(my $data, '<', $file) or die "Could not open '$file' $!\n";

while (my $line = <$data>) {

    print $line;

    if ($csv->parse($line)) {

        my @fields = $csv->fields();

    } else {

        #warn "Line could not be parsed: $line\n";
        Text::CSV->error_input();
    }
}

close $data;
}

我认为我所寻找的功能有什么不对，因为我怀疑我需要将文件作为一个整体读入内存，而不是逐行读取。请帮助，谢谢。

Answer 1

这一天我不是perl，但这是我的答案。使用第二个字段作为键创建哈希表。像这样。

%hashtbl{555555} = {
                    id => 102,                         # first field 
                    names => "doe, john",              # third field
                    msg => "DOE, JOHN, your trip..."   # last field 
                    };

如果密钥已存在于哈希表中，则附加其msg

if(exists $hashtbl[$KEY]) 
    $hashtbl{$KEY}->{msg} .= "AND $last_field"

读完整个文件后，使用此哈希表创建一个新的csv文件。

Answer 2

这样的事情应该有效。

它并不完美，但它应该会有很大的提升。例如，您需要添加一些垃圾来删除展平描述列中的额外名称。

my $data = parseFile($path);
flatten_record($_) for @$data;
writeFile($newpath, $data);


sub csv_cols { qw/ id phone name desc / ) }

sub get_csv {
    my $csv = Text::CSV->new({
        sep_char     => ',',
        always_quote => 1,
        quote_char   => '"',
        escape_char  => '"',
        binary       => 1,
        auto_diag    => 1
    });
}


#Subroutine to parse csv file
sub parseFile() {
    my ($file) = @_;    
    die "CSV file not passed in subroutine\n"
         unless $file;

    my $csv = get_csv();

    #Open file for reading
    open(my $fh, '<', $file)
         or die "Could not open '$file' $!\n";

    $csv->column_names( csv_cols() );

    # make hash of arrays containing 
    my %by_phone;
    for my $row ( @{$csv->getline_hr_all($fh)} ) {
        my $phone = $row->{phone}
        $by_phone{$phone} = [] unless $by_phone{$phone};
        push @{$by_phone{$phone}}, $row;
    }

    return [ values %by_phone ];
}


sub flatten_record {
    my ($record) = @_;

    die "Empty record." if @$record == 0;

    if ( @$record == 1 ) {
         $record = $record->[0];
    } else {
         $record = {
             id    => $record->[0]{id},
             phone => $record->[0]{phone},
             name  => $record->[0]{name},
             desc  => "$record->[0]{desc} AND $record->[1]{desc}",
         };
    }

    return $record;
}

sub writeFile {
    my ( $path, $data ) = @_;

    open my $fh, ">", $path
        or die "Error opening '$path' for writing- $!\n";

    my $csv = get_csv();

    for my $record ( $data ) {
        my @row = @{$record}{ csv_cols() };
        $csv->print( $fh, \@row );
    }
}

如何使用Perl Text :: CSV基于重复字段组合CSV行？

2 个答案: