Question

这是我开始的地方。我使用while循环一次从数据库中读取数组。我想从数据库中获取重复的元素（在某些字段上）。我想只保留这些字段中唯一的项目。然后我想以某种方式打印出我保存的数据。我创建了我认为会做的代码，但是它给了我一切，包括在场上重复的项目。我一直在寻找和搜索，我无法弄明白，我在想，作为一个perl noob，我想念一些简单的东西。代码如下：

my @uniques = ();
my $output;

while (my @itemArray = $sth->fetchrow_array() ) {
    my $duplicateFlag = 0;  
    foreach (@uniques){
        if(  ($itemArray[3] eq "$_->[3]") and ($itemArray[4] eq "$_->[4]")
               and ($itemArray[5] eq "$_->[5]" ) and ($itemArray[6] eq "$_->[6]" )
               and ($itemArray[7] eq "$_->[7]" ) and ($itemArray[8] == "$_->[8]" ) ){
            $duplicateFlag = 1;
        }
    }
    if( $duplicateflag == 0){
        $refToAdd = \@itemArray;
        push(@uniques, $refToAdd);
        $output .= "$itemArray[3]" . "\t$itemArray[8]" . "\t$itemArray[5]" . "\t$itemArray[7]\n";
    }
}
print $output

Answer 1

一种可能性：使用哈希来确定之前是否曾见过某个项目。从您的代码中略微简化：

my %dupHash;
while (my @itemArray = $sth->fetchrow_array() ) {
    my $uniqueItem = itemArray[4];
    if (not exists $dupHash{$uniqueItem}) {
        print "Item $uniqueItem\n";
        $dupHash{$uniqueItem} = \@itemArray;
    }
}

好的，它非常简单，但你明白了。通过使用我想要验证的值的哈希是唯一的，我可以避免双循环和O²算法效率。（Dang！大学那些年终于得到了回报！）。

您可能希望通过组合要搜索重复的所有字段来使用更复杂的哈希键。也许是这样的：

 # Probably could use join to make it more efficient...
 my $uniqueKay = "$item[3]:$item[4]:$item[5]:$item[6]:$item[7]:$item[8]";
 if (not exists $dupHash{$uniqueKey}) {

如果您可以将它们存储在哈希中，那么主要是避免一次又一次地遍历所有唯一项目。

Answer 2

可能：

$itemArray[8] == "$_->[8]"

应该是：

$itemArray[8] eq "$_->[8]"

匹配所有其他人。

可以解决您问题的另一件事是删除“$ _-＆gt; [8]”周围的引号。取决于您的数据。

Answer 3

您将获得所有重复项，因为在第13行未定义$ duplicateflag。在use strict; use warnings; on的脚本上运行语法测试会产生以下警告：

Global symbol "$duplicateflag" requires explicit package name at t10.pl line 18.

如果我们仔细检查你对“那个”变量的定义，它会说：

my $duplicateFlag = 0;

也就是说，你有一个大写字母F，这意味着$ duplicateflag与$ duplicateFlag不是同一个变量。检查undef == 0仍会产生真值并导致误报。

为了避免这样的问题，请始终使用

运行脚本

use strict;
use warnings;

Answer 4

SQL group by或select distinct是保持行唯一的SQL数据库方式。

但是如果你要在Perl中这样做，我同意哈希和键是要走的路。但是，我们可以建议的任何分隔符也可能存在于数据中。这使您有可能进行模糊匹配。一种基于散列的方法是明确的，并使用Perl的自然结构来划分您的字段。

这就是我提出以下内容的原因。

my %uniq;

while ( my @r = $sth->fetchrow_array()) {
    next unless $uniq{ $r[3] }{ $r[4] }{ $r[5] }{ $r[6] }{ $r[7] }{ $r[8] }++; 
    # unique code here
    #...
}

那会消除临时变量。因此消除了拼写错误临时变量的结果。但是，USUW更适合这些事情：USUW =“use strict; use warnings;”。

有关字符串比较和perl中的引用的问题

4 个答案: