用perl编写多行记录

时间:2014-11-21 03:27:29

标签: perl deduplication

我在文本文件中有多行记录我想使用perl进行重复数据删除:

记录由"#end-of-record"分隔。字符串,看起来像这样:

CAPTAIN GIBLET'S NEWT CORRAL
555 RANDOM ST
TARDIS, CT 99999

We regret to inform you that we must repossess your pants in part due to your being 6 months late on payments. But mostly it's maliciousness. :)

TOTAL DUE: $30.00

#end-of-record

这是我最初的尝试:

    #!/usr/bin/perl -w

    use strict;

    {
            local $/ = "#end-of-record";

            my %seen;
            while ( my $record = <> ) {

                    if (not exists $seen{$record}) {
                            print $record;
                            $seen{$record} = 1;
                    }
            }

    }

这是打印出每条记录......并重复记录。我哪里出错了?

更新
上面的代码似乎有效。

1 个答案:

答案 0 :(得分:0)

gawk 'BEGIN {ORS = RS = "#end-of-record\n"} !$seen[$0]++
      END { print $ORS }' yourfile