使用Perl删除文件中的重复字符串

时间:2015-09-01 13:26:27

标签: perl

我想对文本文件中的某些字符串执行以下操作。我得到了我需要的基本功能的帮助,但现在我想对重复的行进行排序并仅打印一次。

我想从字符串中删除Z,ZN和LVT并对它们进行排序。

输入:

     abchsfk/jshflka/ZN                       (cellLVT)
     abchsfk/jshflka/ZN                       (cellLVT)

     asjkfsa/sfklfkshfsf/Z                    (mobLVT)
     asjkfsa/sfklfkshfsf/Z                    (mobLVT)

     asjhfdjkfd/sjfdskjfhdk/hsakfshf/Z        (celLVT)
     asjhfdjkfd/sjfdskjfhdk/hsakfshf/Z        (celLVT)

     asjhdjs/jhskjds/ZN                       (abcLVT)
     asjhdjs/jhskjds/ZN                       (abcLVT)

     shdsjk/jhskd/ZN                          (xyzLVT)
     shdsjk/jhskd/ZN                          (xyzLVT)

期望的输出:

   abchsfk/jshflka                     cell

   asjkfsa/sfklfkshfsf                 mob

   asjhfdjkfd/sjfdskjfhdk/hsakfshf     cel

   asjhdjs/jhskjds                     abc

   shdsjk/jhskd                        xyz

代码:

     if ($line =~ /LVT/ && ($line =~ /ZN/ || $line =~ /Z/) )        

          ####  matches the words LVT and ( Z or ZN)

      {   
             $line =~ s/\/ZN?|\(|LVT\)//g;

            my @line_out = $line;

            $lvt_out = sort::$line_out();

            print OUT " $lvt_out \n";

     }

2 个答案:

答案 0 :(得分:4)

只需使用map以及List::MoreUtils中的uniq即可完成此操作。我从您的评论中看到,您实际上并不想对这些数据进行排序,因此我将其排除在外

此程序需要输入文件的路径作为命令行上的参数

use strict;
use warnings;

use List::MoreUtils qw/ uniq /;

my @rows = uniq map { m| ([\w/]+)/ZN? \s+ \((\w+)LVT\) |x ? "$1\t$2" : () } <>;

printf "%-31s %s\n", split /\t/ for @rows;

输出

abchsfk/jshflka                 cell
asjkfsa/sfklfkshfsf             mob
asjhfdjkfd/sjfdskjfhdk/hsakfshf cel
asjhdjs/jhskjds                 abc
shdsjk/jhskd                    xyz

答案 1 :(得分:1)

你实际上并没有在那里整理任何东西。获得你的输出:

#!/usr/bin/env perl
use strict;
use warnings;

my %seen;

while ( my $line = <DATA> ) {
    if ( $line =~ /LVT/ && ( $line =~ /ZN/ || $line =~ /Z/ ) )
        ####  matches the words LVT and ( Z or ZN)
    {
        $line =~ s/\/ZN?|\(|LVT\)//g;
        print $line unless $seen{$line}++;
    }
}

__DATA__
abchsfk/jshflka/ZN                       (cellLVT)
abchsfk/jshflka/ZN                       (cellLVT)

asjkfsa/sfklfkshfsf/Z                    (mobLVT)
asjkfsa/sfklfkshfsf/Z                    (mobLVT)

asjhfdjkfd/sjfdskjfhdk/hsakfshf/Z        (celLVT)
asjhfdjkfd/sjfdskjfhdk/hsakfshf/Z        (celLVT)

asjhdjs/jhskjds/ZN                       (abcLVT)
asjhdjs/jhskjds/ZN                       (abcLVT)

shdsjk/jhskd/ZN                          (xyzLVT)
shdsjk/jhskd/ZN                          (xyzLVT)

这给出了:

abchsfk/jshflka                       cell
asjkfsa/sfklfkshfsf                    mob
asjhfdjkfd/sjfdskjfhdk/hsakfshf        cel
asjhdjs/jhskjds                       abc
shdsjk/jhskd                          xyz

如果您认真对待它们进行排序 - 您使用的标准是什么?一个简单的字母数字排序:

print sort keys %seen;

给出:

abchsfk/jshflka                       cell
asjhdjs/jhskjds                       abc
asjhfdjkfd/sjfdskjfhdk/hsakfshf        cel
asjkfsa/sfklfkshfsf                    mob
shdsjk/jhskd                          xyz