Question

我有一个数据集

10-101570715-101609901-hsa-mir-3158-1   10-101600739-101609661-ENSG00000166171  10-101588288-101609668-ENSG00000166171  10-101588325-101609447-ENSG00000166171  10-101594702-101609439-ENSG00000166171  10-101570560-101596651-ENSG00000166171  

10-103389007-103396515-hsa-mir-1307 10-103389041-103396023-ENSG00000173915  10-103389050-103396074-ENSG00000173915  10-103389050-103396441-ENSG00000173915  10-103389050-103396466-ENSG00000173915  10-103389050-103396466-ENSG00000173915

除了每行中的第一个元素外，我有多个值，这些值是冗余的，我想删除冗余值。我写了一段代码，但我觉得它没有用。

open (fh, "file1");
while ($line=<fh>)
{
chomp ($line);
@array=$line;
my @unique = ();
my %Seen   = ();
foreach my $elem ( @array )
    {
    next if $Seen{ $elem }++;
    push @unique, $elem;
    }
print @unique;
}

Answer 1

哈希用于重复检测：

my %seen;
my @removeduplicate = grep { !$seen{$_}++ } @array;

对我来说，下面的代码工作正常：

use strict;
use warnings;

my %seen;
open my $fh, "<", 'file.txt' or die "couldn't open : $!";
while ( my $line = <$fh>)
{
    chomp $line;
    my @array = split (' ', $line);
    my @removeduplicate = grep { !$seen{$_}++ } @array;
    print "@removeduplicate\n";
}

从Perl中删除数组中的重复元素

1 个答案: