Question

我在文本文件中有多行记录我想使用perl进行重复数据删除：

记录由＆＃34;＃end-of-record＆＃34;分隔。字符串，看起来像这样：

CAPTAIN GIBLET'S NEWT CORRAL
555 RANDOM ST
TARDIS, CT 99999

We regret to inform you that we must repossess your pants in part due to your being 6 months late on payments. But mostly it's maliciousness. :)

TOTAL DUE: $30.00

#end-of-record

这是我最初的尝试：

    #!/usr/bin/perl -w

    use strict;

    {
            local $/ = "#end-of-record";

            my %seen;
            while ( my $record = <> ) {

                    if (not exists $seen{$record}) {
                            print $record;
                            $seen{$record} = 1;
                    }
            }

    }

这是打印出每条记录......并重复记录。我哪里出错了？

更新
上面的代码似乎有效。

Answer 1

gawk 'BEGIN {ORS = RS = "#end-of-record\n"} !$seen[$0]++
      END { print $ORS }' yourfile

用perl编写多行记录

1 个答案: