我在文本文件中有多行记录我想使用perl进行重复数据删除:
记录由"#end-of-record"分隔。字符串,看起来像这样:
CAPTAIN GIBLET'S NEWT CORRAL 555 RANDOM ST TARDIS, CT 99999 We regret to inform you that we must repossess your pants in part due to your being 6 months late on payments. But mostly it's maliciousness. :) TOTAL DUE: $30.00 #end-of-record
这是我最初的尝试:
#!/usr/bin/perl -w
use strict;
{
local $/ = "#end-of-record";
my %seen;
while ( my $record = <> ) {
if (not exists $seen{$record}) {
print $record;
$seen{$record} = 1;
}
}
}
这是打印出每条记录......并重复记录。我哪里出错了?
更新
上面的代码似乎有效。
答案 0 :(得分:0)
gawk 'BEGIN {ORS = RS = "#end-of-record\n"} !$seen[$0]++
END { print $ORS }' yourfile