Question

我正在尝试浏览一个非常大的CSV文件，以查找每列的所有唯一字符串。例如：

John
John
John
Mark

应该返回John和Mark。

我无法弄清楚我的代码存在什么问题。错误消息也没有帮助（特别是第3和第4错误）：

“my”变量@found在getdata.pl第66行掩盖了相同范围内的早期声明 “my”变量$ answer在getdata.pl第67行的同一语句中屏蔽了之前的声明 getdata.pl第55行的语法错误，靠近“）{” 全局符号“@master_fields”需要显式包名称（您是否忘记在getdata.pl第58行声明“my @master_fields”？）。 getdata.pl第61行的语法错误，靠近“} else”

有人能指出我正确的方向吗？

这是我的代码：

# open file
open my $lines, '<', 'data.csv' or die "Unable to open data.csv\n";
my @records = <$lines>;
close $lines or die "Unable to close data.csv\n";   # Close the input file

# iterate through each line
foreach my $line ( @records ) {

    if ( $csv->parse($line) ) {

        my @master_fields = $csv->fields();

        # if the string is already in the @found array, go to next line.
        if ( grep( /^$master_fields[0]$/, @found ) {
            next;
        }
        else {
            # else; add to the @found array
            push @found, $master_fields[0];
        }        
    }
    else {
        warn "Line/record could not be parsed: @yob_records\n";
    }
}

Answer 1

if ( grep( /^$master_fields[0]$/, @found ){

应该是

if ( grep( /^$master_fields[0]$/, @found ) ){

由于$master_fields[0]不包含正则表达式模式，因此您需要将其转换为正则表达式模式。

grep( /^$master_fields[0]$/, @found )

应该是

grep( /^\Q$master_fields[0]\E$/, @found )

因为你希望与$master_fields[0]完美匹配，

grep( /^\Q$master_fields[0]\E$/, @found )

应该是

grep( /^\Q$master_fields[0]\E\z/, @found )

或更好，

grep( $_ eq $master_fields[0], @found )

最后，您错误地使用了CSV解析器-let，它通过使用getline而不是在换行符上拆分来确定记录的结束位置 - 并且您的效率非常低-O（N ²）而不是O（N） - 使用数组而不是散列。

my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2 });  # Or Text::CSV

my $qfn = 'data.csv';
open(my $fh, '<', $qfn)
    or die("Unable to open \"$qfn\": $!\n");

my %found;
while ( my $row = $csv->getline($fh) ) {
    ++$found{ $row->[0] };
}

my @found = sort keys %found;

在打开文件以管理数据时，没有解释的Perl语法错误

1 个答案: