当具有多个同名名称时,我需要在一个列中按唯一属(物种名称的第一部分)提取,而在CSV文件的另一列中按最大编号提取。
因此,如果有多个属(名字相同),则在最后一列中使用最大的数字来选择代表该属的类。
我已经将信息提取到数组中,但是我很难将两者结合在一起以进行选择。我在用 https://perlmaven.com/unique-values-in-an-array-in-perl 可以帮助,但在相同情况下,我需要在最后一栏中包括最大的数字。
use strict;
use warnings;
open taxa_fh, '<', "$ARGV[0]" or die qq{Failed to open "$ARGV[0]" for input: $!\n};
open match_fh, ">$ARGV[0]_genusLongestLEN.csv" or die qq{Failed to open for output: $!\n};my @unique;
my %seen;
my %hash;
while ( my $line = <taxa_fh> ) {
chomp( $line );
my @parts = split( /,/, $line );
my @name = split( / /, $parts[3]);
my @A = $name[0];
my @B = $parts[5];
@seen{@A} = ();
my @merged = (@A, grep{!exists $seen{$_}} @B);
my @merged = (@A, @B);
@hash{@A} = @B;
print "$line\n";
}
close taxa_fh;
close match_fh;
AB179735.1.1711,AB179735.1.1711,278983,Eucyrtidium hexagonatum,0,1600
AB179736.1.1725,AB179736.1.1725,278986,Pterocorys zancleus,0,1763
AB181888.1.1758,AB181888.1.1758,281609,Protoperidinium crassipes,0,1700
AB181890.1.1709,AB181890.1.1709,281610,Protoperidinium denticulatum,0,1800
AB181892.1.1738,AB181892.1.1738,281611,Protoperidinium divergens,0,1800
AB181894.1.1744,AB181894.1.1744,281612,Protoperidinium leonis,0,1500
AB181899.1.1746,AB181899.1.1746,281613,Protoperidinium pallidum,0,1600
AB181902.1.1741,AB181902.1.1741,261845,Protoperidinium pellucidum,0,1750
AB181904.1.1734,AB181904.1.1734,281614,Protoperidinium punctulatum,0,1599
AB181907.1.1687,AB181907.1.1687,281615,Protoperidinium thorianum,0,1600
AB120001.1.1725,AB120001.1.1725,244960,Gyrodinium spirale,0,1500
AB120002.1.1725,AB120002.1.1725,244961,Gyrodinium fusiforme,0,1800
AB120003.1.1724,AB120003.1.1724,244962,Gyrodinium rubrum,0,1700
AB120004.1.1723,AB120004.1.1723,244963,Gyrodinium helveticum,0,1500
AB120309.1.1800,AB120309.1.1800,4442,Camellia sinensis,0,1700
AB179735.1.1711,AB179735.1.1711,278983,Eucyrtidium hexagonatum,0,1600
AB179736.1.1725,AB179736.1.1725,278986,Pterocorys zancleus,0,1763
AB181890.1.1709,AB181890.1.1709,281610,Protoperidinium denticulatum,0,1800
AB120002.1.1725,AB120002.1.1725,244961,Gyrodinium fusiforme,0,1800
AB120309.1.1800,AB120309.1.1800,4442,Camellia sinensis,0,1700
答案 0 :(得分:1)
use Text::CSV_XS qw( );
my $csv = Text::CSV_XS->new({
auto_diag => 2,
binary => 1,
quote_space => 0,
});
my %by_genus;
while ( my $row = $csv->getline(\*ARGV) ) {
my ($genus) = split(' ', $row->[3]);
$by_genus{$genus} = $row
if !$by_genus{$genus}
|| $row->[5] > $by_genus{$genus}[5];
}
$csv->say(select(), $_) for values(%by_genus);
答案 1 :(得分:0)
正确命名变量可使代码更具可读性:
if($_REQUEST['param3'] == 'asdf' || $_REQUEST["param3"] == ""){
$B_qty = array();
$conn_B = connectBdb();
$ChkBShopItemSQL = "select itemcode,shopqty from shopitem where shopcode = 'B' AND shopqty <> 0;";
foreach ($conn_B->query($ChkBShopItemSQL) AS $result) {
$B_qty[$result['itemcode']] = $result['shopqty'];
}
echo json_encode($B_QTY);
}
if($_REQUEST["testing"] == "asdf" || $_REQUEST["testing"] == ""){
$A_qty = array();
$conn_A = connectAdb();
$ChkAShopItemSQL = "select itemcode,shopqty from shopitem where shopcode = 'A' AND shopqty <> 0;";
foreach ($conn_A->query($ChkAShopItemSQL) AS $result) {
$A_qty[$result['itemcode']] = $result['shopqty'];
}
echo json_encode($A_qty);
}
输出行的顺序是随机的。
答案 2 :(得分:0)
您也可以在此Perl命令行中输入
perl -F, -lane ' ($g=$F[3])=~s/(^\S+).*/$1/; if( $mx{$g}<$F[-1])
{ $kv{$g}=$_;$mx{$g}=$F[-1] } END { print $kv{$_} for(keys %kv) } ' file
使用cara.txt文件中的给定输入,输出为
$ perl -F, -lane ' ($g=$F[3])=~s/(^\S+).*/$1/; if( $mx{$g}<$F[-1])
{ $kv{$g}=$_;$mx{$g}=$F[-1] } END { print $kv{$_} for(keys %kv) } ' cara.txt
AB179736.1.1725,AB179736.1.1725,278986,Pterocorys zancleus,0,1763
AB179735.1.1711,AB179735.1.1711,278983,Eucyrtidium hexagonatum,0,1600
AB120309.1.1800,AB120309.1.1800,4442,Camellia sinensis,0,1700
AB120002.1.1725,AB120002.1.1725,244961,Gyrodinium fusiforme,0,1800
AB181890.1.1709,AB181890.1.1709,281610,Protoperidinium denticulatum,0,1800
$
答案 3 :(得分:-1)
不喜欢但可以完成工作
#!/usr/bin/perl
use strict;
my @data = `cat /var/tmp/test.in`;
my %genuses = ();
foreach my $line ( @data ) {
chomp($line);
my @splitline = split(',', $line);
my $genus = $splitline[3];
my $num = $splitline[5];
my ( $name, $extra ) = split(' ', $genus);
if ( exists $genuses{$name}->{'num'} ) {
if ( $genuses{$name}->{'num'} < $num ) {
$genuses{$name}->{'num'} = $num;
$genuses{$name}->{'line'} = $line;
}
else {
next;
}
}
else {
$genuses{$name}->{'num'} = $num;
$genuses{$name}->{'line'} = $line;
}
}
foreach my $genus ( %genuses ) {
print "$genuses{$genus}->{'line'}";
print "\n";
}
输出:
[root@localhost tmp]# ./test.pl
AB179736.1.1725,AB179736.1.1725,278986,Pterocorys zancleus,0,1763
AB179735.1.1711,AB179735.1.1711,278983,Eucyrtidium hexagonatum,0,1600
AB120309.1.1800,AB120309.1.1800,4442,Camellia sinensis,0,1700
AB120002.1.1725,AB120002.1.1725,244961,Gyrodinium fusiforme,0,1800
AB181890.1.1709,AB181890.1.1709,281610,Protoperidinium denticulatum,0,1800
没有看到一种明显的方法来对输出进行排序