Question

我有一个文件（比如bugs.txt），它是通过运行一些代码生成的。此文件包含JIRAS列表。我想编写一个代码，可以删除此文件中的重复条目。

逻辑应该是通用的，因为bugs.txt文件每次都会不同。

示例输入文件bugs.txt：

BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221

示例输出：

BUG-111, BUG-122, BUG-123, JIRA-221, JIRA-234

我的试用代码：

my $file1="/path/to/file/bugs.txt";
my $Jira_nums;
open(FH, '<', $file1) or die $!;
  {
    local $/;
    $Jira_nums = <FH>;
  }
close FH;

我需要帮助设计从文件bugs.txt中删除重复条目的逻辑

Answer 1

You just need to add these lines to your script:

my %seen;
my @no_dups = grep{!$seen{$_}++}split/,?\s/,$Jira_nums;

You'll get:

use strict;
use warnings;
use Data::Dumper;

my $file1="/path/to/file/bugs.txt";
my $Jira_nums;
open(my $FH, '<', $file1) or die $!; # use lexical file handler
  {
    local $/;
    $Jira_nums = <$FH>;
  }
my %seen;
my @no_dups = grep{!$seen{$_}++}split/,?\s/,$Jira_nums;
say Dumper \@no_dups;

For input data like:

BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221

it gives:

$VAR1 = [
          'BUG-111',
          'BUG-122',
          'BUG-123',
          'JIRA-221',
          'JIRA-234'
        ];

Answer 2

你可以试试这个：

use strict;
use warnings;

my @bugs = "";
@bugs =  split /\,?(\s+)/, $_ while(<DATA>);
my @Sequenced = map {$_=~s/\s*//g; $_} RemoveDup(@bugs);

print "@Sequenced\n";

sub RemoveDup {     my %checked;   grep !$checked{$_}++, @_;  }


__DATA__
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221

perl代码从文件中删除重复的条目

2 个答案: