在此处查看问题: match-rows-based-on-first-field-and-combine-second-field
如果满足以下条件,您将如何解决这个问题:
示例文件:
External ID Item Name Item Description Release Date Expiry Date Weight Template ID Enabled EntityId Classifications Address N/A City State Zipcode Country of domain purchase is made from Title Cover Image Link Author
411280 Shade me Shade me 04-May-2017 01-Jan-9999 0 Y -1 Teen Shade me MC.GIF http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=411280 Brown, Jennifer
411280 Shade me Shade me 04-May-2017 01-Jan-9999 0 Y -1 Books Shade me MC.GIF http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=411280 Brown, Jennifer
413036 Now that's what I call music! Now that's what I call music! 04-May-2017 01-Jan-9999 0 Y -1 Teen Now that's what I call music! MC.GIF http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=413036
挑战是匹配重复的ID并合并类别。
Shade me MC.GIF http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=411280 Brown, Jennifer
411280 Shade me Shade me 04-May-2017 01-Jan-9999 0 Y -1 Teen;Books
更新
while ( <FILE> ) {
next if 1..1;
chomp $_;
my ( $id, $name, $desc, $reldate, $expdate, $weight, $temp, $enabled, $ent, $class, $addr, $na, $city, $state, $zip, $country, $title, $img, $link, $auth ) = split /\t/ , $_;
if ( ! $merge{$id} ) {
$merge{$id} = "$id, $name, $desc, $reldate, $expdate, $weight, $temp, $enabled, $ent, $class, $addr, $na, $city, $state, $zip, $country, $title, $img, $link, $auth";
} else {
$merge{$class} .= ";$class" if ( $merge{$id} ne $class )
}
}
p %merge;
给我提出问题的一句话是:
$merge{$class} .= ";$class" if ( $merge{$id} ne $class )
你可以看到我需要做什么 - 合并类字段。不工作
答案 0 :(得分:2)
我会将文件加载到某个数据结构中并记住每个唯一的列值,然后根据需要打印它们。例如。如在此示例中一样(使用|
,因为它的分隔符更好地显示为\t
):
#!/usr/bin/env perl
use 5.024;
use warnings;
use Data::Dumper;
my $records;
my $numcols;
while(<DATA>) {
chomp;
my(@cols) = split /\|/, $_, -1;
$numcols = @cols if( $. == 1 );
die "Wrong number of columns (@{[scalar @cols]} instead of $numcols) in line $." unless (@cols == $numcols);
$records->{$cols[0]}->[0] = $. unless $records->{$cols[0]}; #remember the line# of the 1st apperance
for(my $c = 1; $c < $numcols; $c++) { #skip the id (col[0])
$records->{$cols[0]}->[$c]->{$cols[$c]}++;
}
}
# if want, check the data-structure
#say Dumper($records);
for my $id (sort {$records->{$a}->[0] <=> $records->{$b}->[0]} keys %$records) {
say join("|",
$id,
map { join(';', sort grep {/\S/} keys $records->{$id}->[$_]->%*) } 1 .. $#{$records->{$id}} #skip col[0]
);
}
__DATA__
ID|Name1|Name2|Name3
id1|c11|c12|c13
id1|c11|c12|c13
id2|c21|c22|c23
id1|c31|c12|c13
id3|c41||c43
id1|c51|c12|c13
id1|c31||c13
id1|c11||c13
id1|c31|c12|c13
id2|c21|c22|c83
id4|c91|c92|
打印
ID|Name1|Name2|Name3
id1|c11;c31;c51|c12|c13
id2|c21|c22|c23;c83
id3|c41||c43
id4|c91|c92|
将一些shell用于漂亮的列perl script.pl | sed 's/||/| |/g' | column -s'|' -t
ID Name1 Name2 Name3
id1 c11;c31;c51 c12 c13
id2 c21 c22 c23;c83
id3 c41 c43
id4 c91 c92
答案 1 :(得分:-2)
小文件的简单方法可以如上所述:
#!/usr/bin/env perl
use common::sense;
use DDP;
my %merge;
while ( <DATA> )
{
next if 1..1;
chomp $_;
my ( $id, $text, $category ) = split /,/ , $_;
if ( ! $merge{$id} )
{
$merge{$id} = "$id,$text,$category";
}
else
{
my ( undef, undef , $c ) = split /,/ , $merge{$id};
if ( $c !~ /\b$category\b/ )
{
$merge{$id} .= ";$category";
}
}
}
p %merge;
__DATA__
Id Title Category
12345,My favorite martian,aliens
13444,Texas Meat,BBQ
12345,My favorite martian,aliens
<强>输出:强>
{
12345 "12345,My favorite martian,aliens;space",
13444 "13444,Texas Meat,BBQ"
}
考虑到你不想要重复的类别会有所帮助。