我对Perl很新,并希望有人能帮我解决这个问题。我需要从CSV文件嵌入的逗号中提取两列。格式如下:
"ID","URL","DATE","XXID","DATE-LONGFORMAT"
我需要在DATE
之后立即提取XXID
列,XXID
列和列。请注意,每行不一定遵循相同的列数。
XXID
列包含2个字母的前缀,并不总是以相同的字母开头。它几乎可以是aplhabet的任何字母。长度总是一样的。
最后,提取这三列后,我需要对XXID
列进行排序并计算重复项。
答案 0 :(得分:3)
以下是使用Text::CSV模块解析csv数据的示例脚本。请参阅模块的文档以找到适合您的数据的设置。
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({ binary => 1 });
while (my $row = $csv->getline(*DATA)) {
print "Date: $row->[2]\n";
print "Col#1: $row->[3]\n";
print "Col#2: $row->[4]\n";
}
答案 1 :(得分:3)
我发布了一个名为Tie::Array::CSV
的模块,它允许Perl与您的CSV作为本机Perl嵌套数组进行交互。如果您使用它,您可以使用搜索逻辑并应用它,就像您的数据已经在数组引用数组中一样。看看吧!
#!/usr/bin/env perl
use strict;
use warnings;
use File::Temp;
use Tie::Array::CSV;
use List::MoreUtils qw/first_index/;
use Data::Dumper;
# this builds a temporary file from DATA
# normally you would just make $file the filename
my $file = File::Temp->new;
print $file <DATA>;
#########
tie my @csv, 'Tie::Array::CSV', $file;
#find column from data in first row
my $colnum = first_index { /^\w.{6}$/ } @{$csv[0]};
print "Using column: $colnum\n";
#extract that column
my @column = map { $csv[$_][$colnum] } (0..$#csv);
#build a hash of repetitions
my %reps;
$reps{$_}++ for @column;
print Dumper \%reps;
答案 2 :(得分:0)
您肯定希望使用CPAN库来解析CSV,因为您永远不会考虑格式的所有怪癖。
请参阅:How can I parse quoted CSV in Perl with a regex?
请参阅:How do I efficiently parse a CSV file in Perl?
但是,对于您提供的特定字符串,这是一个非常幼稚且非惯用的解决方案:
use strict;
use warnings;
my $string = '"ID","URL","DATE","XXID","DATE-LONGFORMAT"';
my @words = ();
my $word = "";
my $quotec = '"';
my $quoted = 0;
foreach my $c (split //, $string)
{
if ($quoted)
{
if ($c eq $quotec)
{
$quoted = 0;
push @words, $word;
$word = "";
}
else
{
$word .= $c;
}
}
elsif ($c eq $quotec)
{
$quoted = 1;
}
}
for (my $i = 0; $i < scalar @words; ++$i)
{
print "column " . ($i + 1) . " = $words[$i]\n";
}