Perl:如果ID重复,需要附加两列

时间:2015-09-04 12:49:22

标签: perl append

如果id重复,我会附加app1,app2并打印一次。

输入:

on

输出:

$modelUsers

我得到的输出:

id|Name|app1|app2    
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|

我的代码:

id|Name|app1|app2
1|abc|234,265|231,321|
2|xyz|123|215|
3|asd|213|235|

3 个答案:

答案 0 :(得分:2)

这应该可以解决问题:

%out

<强> 修改

要查看use Data::Dumper; 包含的内容(如果不清楚),您可以使用

print Dumper(%out);

并通过

打印
{{1}}

答案 1 :(得分:1)

我这样解决它:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use 5.14.0;

my %stuff;

#extract the header row.
#use the regex to remove the linefeed, because
#we can't chomp it inline like this. 
#works since perl 5.14
#otherwise we could just chomp (@header) later. 
my ( $id, @header ) = split( /\|/, <DATA> =~ s/\n//r );

while (<DATA>) {

    #turn this row into a hash of key-values.
    my %row;
    ( $id, @row{@header} ) = split(/\|/);
    #print for diag 
    print Dumper \%row;

    #iterate each key, and insert into $row.
    foreach my $key ( keys %row ) {
        push( @{ $stuff{$id}{$key} }, $row{$key} );
    }
}

#print for diag    
print Dumper \%stuff;

print join ("|", "id", @header ),"\n";

#iterate ids in the hash
foreach my $id ( sort keys %stuff ) {

    #join this record by '|'.
    print join('|',
        $id,
        #turn inner arrays into comma separated via map.
        map {
            my %seen;
            #use grep to remove dupes - e.g. "abc,abc" -> "abc"
            join( ",", grep !$seen{$_}++, @$_ )
        } @{ $stuff{$id} }{@header}
        ),
        "\n";
}

__DATA__
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|

对于您的应用程序来说,这可能有点过分,但它应该处理任意列标题和重复的任意数量。我会合并他们 - 所以两个abc条目不会结束abc,abc

输出是:

id|Name|app1|app2
1|abc|234,265|231,321
2|xyz|123|215
3|asd|213|235

答案 2 :(得分:1)

另一种不使用哈希的方法(如果你想要更高效的内存),我的贡献在于开放:

#!/usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
select(OUTFILE);
open(FH, '<', join('', $basedir, $file)) || die $!;

print(scalar(<FH>));
my @lastobj = (undef);
foreach my $obj (sort {$a->[0] <=> $b->[0]}
                 map {chomp;[split('|')]} <FH>) {
    if(defined($lastobj[0]) &&
       $obj[0] eq $lastobj[0])
      {@lastobj = (@obj[0..1],
                   $lastobj[2].','.$obj[2],
                   $lastobj[3].','.$obj[3])}
    else
      {
        if($lastobj[0] ne '')
          {print(join('|',@lastobj),"|\n")}
        @lastobj = @obj[0..3];
      }
}
print(join('|',@lastobj),"|\n");

请注意,拆分,没有它的第三个参数会忽略空元素,这就是你必须添加最后一个条的原因。如果你没有做一个chomp,你不会需要提供酒吧或尾随硬回车,但你必须记录$ obj [4]。