Question

我想知道我的数组中是否有重复的项目，超过16.000因此会自动化它可能还有其他方法，但我从这开始，并且，除非有一个简单的命令，否则我想完成它。我正在做的是从一个数组转移到另一个数组，这样，检查目标数组以查看它是否“在数组中”（就像在PHP中有这样的命令）。

所以，我得到了这个子例程，它适用于文字，但它不适用于变量。这是因为'eq'或我需要的任何东西。 'sourcefile'将包含目标数组的一个或多个单词。

// Here I just fetch my file


    $listamails = <STDIN>;
    # Remove the newlines filename
    chomp $listamails;
    # open the file, or exit
    unless ( open(MAILS, $listamails) ) {

    print "Cannot open file \"$listamails\"\n\n";
    exit;
    }
    # Read the list of mails from the file, and store it
    # into the array variable @sourcefile
    @sourcefile = <MAILS>;
    # Close the handle - we've read all the data into @sourcefile now.
    close MAILS;


    my @destination = ('hi', 'bye');

    sub in_array
    {
       my ($destination,$search_for) = @_;
       return grep {$search_for eq $_} @$destination;
    }

    for($i = 0; $i <=100; $i ++)

    {
      $elemento = shift @sourcefile;
      if(in_array(\@destination, $elemento))
      {
        print  "it is";
      }
      else
      {
        print "it aint there";
      }
    }

好吧，如果不是将$ elemento包含在那里我放了'hi'它确实有效，而且我打印了$ elemento的值也是'hi'，但是当我把变量放入时，它不会工作，那是因为'eq'，但我不知道还能放什么。如果我把==它抱怨'hi'不是数值。

Answer 1

当你想要不同的值时，想想哈希。

my %seen;
@seen{ @array } = (); 

if (keys %seen == @array) {
    print "\@array has no duplicate values\n";
}

Answer 2

目前尚不清楚你想要什么。如果你的第一句话是唯一重要的句子（“我想知道我的数组中是否有重复的项目”），那么你可以使用：

my %seen;
if (grep ++$seen{$_} >= 2, @array) {
   say "Has duplicates";
}

你说你有一个大阵列，所以一找到副本就可能更快停止。

my %seen;
for (@array) {
   if (++$seen{$_} == 2) {
      say "Has duplicates";
      last;
   }
}

Answer 3

顺便说一下，当在大量项目中查找重复项时，使用基于排序的策略要快得多。在对项目进行排序之后，所有重复项都将紧挨着彼此，因此要判断某些内容是否重复，您只需将其与之前的内容进行比较：

@sorted = sort @sourcefile;
for (my $i = 1; $i < @sorted; ++$i) {   # Start at 1 because we'll check the previous one
    print "$sorted[$i] is a duplicate!\n" if $sorted[$i] eq $sorted[$i - 1];
}

如果有多个欺骗，这将打印多个欺骗消息，但您可以清理它。

Answer 4

正如eugene y所说，哈希绝对是走到这里的方式。这是您发布到基于哈希的方法的代码的直接翻译（在此过程中添加了更多的Perlishness）：

my @destination = ('hi', 'bye');
my %in_array = map { $_ => 1 } @destination;

for my $i (0 .. 100) {
  $elemento = shift @sourcefile;
  if(exists $in_array{$elemento})
  {
    print  "it is";
  }
  else
  {
    print "it aint there";
  }
}

此外，如果您要检查@sourcefile的所有元素（而不是测试前101个元素）而不是@destination，则应将for行替换为

while (@sourcefile) {

另外，不要忘记chomp从文件中读取的任何值！从文件中读取的行在其末尾有一个换行符（初始问题的注释中提到的\r\n或\n），这将导致eq和哈希查找报告否则匹配值是不同的。这很可能是您的代码首先无法正常工作并更改为使用sort或哈希值无法解决此问题的原因。首先chomp您的输入是为了使其有效，然后使用sort或哈希来提高效率。

Perl需要正确的grep运算符来匹配变量的值

4 个答案: