我有一个数据集
10-101570715-101609901-hsa-mir-3158-1 10-101600739-101609661-ENSG00000166171 10-101588288-101609668-ENSG00000166171 10-101588325-101609447-ENSG00000166171 10-101594702-101609439-ENSG00000166171 10-101570560-101596651-ENSG00000166171
10-103389007-103396515-hsa-mir-1307 10-103389041-103396023-ENSG00000173915 10-103389050-103396074-ENSG00000173915 10-103389050-103396441-ENSG00000173915 10-103389050-103396466-ENSG00000173915 10-103389050-103396466-ENSG00000173915
除了每行中的第一个元素外,我有多个值,这些值是冗余的,我想删除冗余值。我写了一段代码,但我觉得它没有用。
open (fh, "file1");
while ($line=<fh>)
{
chomp ($line);
@array=$line;
my @unique = ();
my %Seen = ();
foreach my $elem ( @array )
{
next if $Seen{ $elem }++;
push @unique, $elem;
}
print @unique;
}
答案 0 :(得分:5)
哈希用于重复检测:
my %seen;
my @removeduplicate = grep { !$seen{$_}++ } @array;
对我来说,下面的代码工作正常:
use strict;
use warnings;
my %seen;
open my $fh, "<", 'file.txt' or die "couldn't open : $!";
while ( my $line = <$fh>)
{
chomp $line;
my @array = split (' ', $line);
my @removeduplicate = grep { !$seen{$_}++ } @array;
print "@removeduplicate\n";
}