我有两个CSV文件。第一个是列表文件,它包含ID和名称。例如
1127100,Acanthocolla cruciata
1127103,Acanthocyrta haeckeli
1127108,Acanthometra fusca
第二个是我要交换的内容,如果找到匹配,则按第一个数字提取行。第一列数字在每个文件中对应。例如
1127108,1,0.60042
1127103,1,0.819671
1127100,2,0.50421,0.527007
10207,3,0.530422,0.624466
所以我想最终得到像这样的CSV文件
Acanthometra fusca,1,0.60042
Acanthocyrta haeckeli,1,0.819671
Acanthocolla cruciata,2,0.50421,0.527007
我试过Perl但是一次打开两个文件就被证明是凌乱的。所以我尝试将其中一个CSV文件转换为字符串并以这种方式解析,但没有用。但后来我正在阅读有关grep
和其他单行的内容,但我并不熟悉它。用grep会这可能吗?
这是我试过的Perl代码
use strict;
use warnings;
open my $csv_score, '<', "$ARGV[0]" or die qq{Failed to open "$ARGV[0]" for input: $!\n};
open my $csv_list, '<', "$ARGV[1]" or die qq{Failed to open "$ARGV[1]" for input: $!\n};
open my $out, ">$ARGV[0]_final.txt" or die qq{Failed to open for output: $!\n};
my $string = <$csv_score>;
while ( <$csv_list> ) {
my ($find, $replace) = split /,/;
$string =~ s/$find/$replace/g;
if ($string =~ m/^$replace/){
print $out $string;
}
}
close $csv_score;
close $csv_list;
close $out;
答案 0 :(得分:2)
您的代码失败了,因为您只读取了$csv_score
文件中的第一行,并且每次更改时都尝试打印$string
。您也无法从$csv_list
文件的行末尾删除换行符。如果您修复了这些内容,那么它就像这样
use strict;
use warnings;
open my $csv_score, '<', "$ARGV[0]" or die qq{Failed to open "$ARGV[0]" for input: $!\n};
open my $csv_list, '<', "$ARGV[1]" or die qq{Failed to open "$ARGV[1]" for input: $!\n};
open my $out, ">$ARGV[0]_final.txt" or die qq{Failed to open for output: $!\n};
my $string = do {
local $/;
<$csv_score>;
};
while ( <$csv_list> ) {
chomp;
my ( $find, $replace ) = split /,/;
$string =~ s/$find/$replace/g;
}
print $out $string;
close $csv_score;
close $csv_list;
close $out;
Acanthometra fusca,1,0.60042
Acanthocyrta haeckeli,1,0.819671
Acanthocolla cruciata,2,0.50421,0.527007
10207,3,0.530422,0.624466
然而,这不是一种安全的做事方式,因为ID可以在别处找到而不是在行的开头
我会像这样在$csv_list
文件中构建一个哈希,这也使程序更简洁
use strict;
use warnings;
use v5.10.1;
use autodie;
my %ids;
{
open my $fh, '<', $ARGV[1];
while ( <$fh> ) {
chomp;
my ($id, $name) = split /,/;
$ids{$id} = $name;
}
}
open my $in_fh, '<', $ARGV[0];
open my $out_fh, '>', "$ARGV[0]_final.txt";
while ( <$in_fh> ) {
s{^(\d+)}{$ids{$1} // $1}e;
print $out_fh $_;
}
输出与上面第一个程序的输出相同
答案 1 :(得分:2)
编写代码的问题是你只执行一次:
my $string = <$csv_score>;
这会从$csv_score
读取一行,而您无法使用其余内容。
我建议你需要:
Text::CSV
通常是处理它的好主意,但对于您的示例来说,似乎 。 所以:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my $csv = Text::CSV->new( { binary => 1 } );
my %replace;
while ( my $row = $csv->getline( \*DATA ) ) {
last if $row->[0] =~ m/NEXT/;
$replace{ $row->[0] } = $row->[1];
}
print Dumper \%replace;
my $search = join( "|", map {quotemeta} keys %replace );
$search =~ qr/($search)/;
while ( my $row = $csv->getline( \*DATA ) ) {
$row->[0] =~ s/^($search)$/$replace{$1}/;
$csv->print( \*STDOUT, $row );
print "\n";
}
__DATA__
1127100,Acanthocolla cruciata
1127103,Acanthocyrta haeckeli
1127108,Acanthometra fusca
NEXT
1127108,1,0.60042
1127103,1,0.819671
1127100,2,0.50421,0.527007
10207,3,0.530422,0.624466
注意 - 这仍会打印源内容的最后一行:
"Acanthometra fusca ",1,"0.60042 "
"Acanthocyrta haeckeli ",1,"0.819671 "
"Acanthocolla cruciata ",2,0.50421,"0.527007 "
(您的数据包含空格,因此Text::CSV
将其包装在引号中)
如果你想丢弃它,那么你可以测试替换是否实际发生:
if ( $row->[0] =~ s/^($search)$/$replace{$1}/ ) {
$csv->print( \*STDOUT, $row );
print "\n";
}
(当然,如果你确定你没有split /,/
通常支持的任何重大事情,你可以继续使用CSV
。
答案 2 :(得分:2)
我想提供一种非常不同的方法。
让我们说你对数据库比对Perl的数据结构感觉更舒服。您可以使用DBD::CSV将CSV文件转换为关系型数据库。它使用了引擎盖下的Text :: CSV(帽子提示为@Sobrique)。您需要从CPAN安装它,因为它没有捆绑在默认的DBI发行版中。
use strict;
use warnings;
use Data::Printer; # for p
use DBI;
my $dbh = DBI->connect( "dbi:CSV:", undef, undef, { f_ext => '.csv' } );
$dbh->{csv_tables}->{names} = { col_names => [qw/id name/] };
$dbh->{csv_tables}->{numbers} = { col_names => [qw/id int float/] };
my $sth_select = $dbh->prepare(<<'SQL');
SELECT names.name, numbers.int, numbers.float
FROM names
JOIN numbers ON names.id = numbers.id
SQL
# column types will be silently discarded
$dbh->do('CREATE TABLE result ( name CHAR(255), int INTEGER, float INTEGER )');
my $sth_insert =
$dbh->prepare('INSERT INTO result ( name, int, float ) VALUES ( ?, ?, ? ) ');
$sth_select->execute;
while (my @res = $sth_select->fetchrow_array ) {
p @res;
$sth_insert->execute(@res);
}
这样做是为两个表(您的CSV文件)设置列名,因为它们没有带名称的第一行。我根据数据类型创建了名称。然后,它将创建一个名为result
的新表(CSV文件),并通过一次写入一行来填充它。
同时,它会将数据(用于调试目的)输出到STDERR
到Data::Printer。
[
[0] "Acanthocolla cruciata",
[1] 2,
[2] 0.50421
]
[
[0] "Acanthocyrta haeckeli",
[1] 1,
[2] 0.819671
]
[
[0] "Acanthometra fusca",
[1] 1,
[2] 0.60042
]
生成的文件如下所示:
$ cat scratch/result.csv
name,int,float
"Acanthocolla cruciata",2,0.50421
"Acanthocyrta haeckeli",1,0.819671
"Acanthometra fusca",1,0.60042