perl代码从文件中删除重复的条目

时间:2017-08-04 10:03:25

标签: regex perl

我有一个文件(比如bugs.txt),它是通过运行一些代码生成的。此文件包含JIRAS列表。我想编写一个代码,可以删除此文件中的重复条目。

逻辑应该是通用的,因为bugs.txt文件每次都会不同。

示例输入文件bugs.txt

BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221

示例输出:

BUG-111, BUG-122, BUG-123, JIRA-221, JIRA-234

我的试用代码:

my $file1="/path/to/file/bugs.txt";
my $Jira_nums;
open(FH, '<', $file1) or die $!;
  {
    local $/;
    $Jira_nums = <FH>;
  }
close FH;

我需要帮助设计从文件bugs.txt中删除重复条目的逻辑

2 个答案:

答案 0 :(得分:1)

You just need to add these lines to your script:

my %seen;
my @no_dups = grep{!$seen{$_}++}split/,?\s/,$Jira_nums;

You'll get:

use strict;
use warnings;
use Data::Dumper;

my $file1="/path/to/file/bugs.txt";
my $Jira_nums;
open(my $FH, '<', $file1) or die $!; # use lexical file handler
  {
    local $/;
    $Jira_nums = <$FH>;
  }
my %seen;
my @no_dups = grep{!$seen{$_}++}split/,?\s/,$Jira_nums;
say Dumper \@no_dups;

For input data like:

BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221

it gives:

$VAR1 = [
          'BUG-111',
          'BUG-122',
          'BUG-123',
          'JIRA-221',
          'JIRA-234'
        ];

答案 1 :(得分:0)

你可以试试这个:

use strict;
use warnings;

my @bugs = "";
@bugs =  split /\,?(\s+)/, $_ while(<DATA>);
my @Sequenced = map {$_=~s/\s*//g; $_} RemoveDup(@bugs);

print "@Sequenced\n";

sub RemoveDup {     my %checked;   grep !$checked{$_}++, @_;  }


__DATA__
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221