我尝试合并3个文件
文件1:4列制表符分隔的文件
ID Column_1 Column_2 Column_3
A 100 100001 X
B 100 99999 Y
C 100 88888 Z
D 99 100001 Y
E 99 88888 Z
文件2:3列制表符分隔的文件
Column_4 Column_5 Column_6
100 100001 X
100 99999 Y
100 88888 Z
99 100001 Y
99 88888 Z
文件3:4列制表符分隔的文件
Column_7 Column_8 Column_9 Column_10
100 120000 100 100001
100 66666 100 99999
100 77777 100 88888
99 100000 99 100001
99 44444 99 88888
我要制作一个合并文件
ID Column_1 Column_2 Column_3 Column_6 Column_7 Column_8
A 100 100001 X X 100 120000
B 100 99999 Y Y 100 66666
C 100 88888 Z Z 100 77777
D 99 100001 Y Y 99 100000
E 99 88888 Z Z 99 44444
我尝试根据列1和2使用哈希方法。但是我发现我有两个键和许多值。如何使用哈希解析这些文件?
答案 0 :(得分:2)
您处在带有散列的正确路径上,只需从每个表的列中计算键。解决方案的剖视图:
$key
@order
中,即它定义了输出表的顺序%table
下的哈希$key
中:将从该表到最终表的列推到数组ref @order
$key
获取%table
下的数组ref STDOUT
将行作为TSV转储#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
eol => "\n",
sep_char => "\t",
}) or die "CSV creation\n";
sub read_file($$) {
my($file, $code) = @_;
open(my $fh, '<', $file);
while (my $row = $csv->getline( $fh )) {
$code->($row);
}
$csv->eof or $csv->error_diag();
close($fh);
}
# Output table + row order
my %table;
my @order;
# Table 1
read_file($ARGV[0], sub {
my($row) = @_;
#print "ROW 1 @{ $row }\n";
my($col1, $col2) = @{ $row }[1,2];
# column_1, column_2 define key
my $key = "${col1}${col2}";
#print "KEY 1 ${key}\n";
# table 1 defines order
push(@order, $key);
# ID, column_1, column_2, column_3 from table 1
$table{$key} = $row;
});
# Table 2
read_file($ARGV[1], sub {
my($row) = @_;
#print "ROW 2 @{ $row }\n";
my($col4, $col5, $col6) = @{ $row };
# column_4, column_5 define key
my $key = "${col4}${col5}";
#print "KEY 2 ${key}\n";
# column_6 from table 2
push(@{ $table{$key} }, $col6);
});
# Table 3
read_file($ARGV[2], sub {
my($row) = @_;
#print "ROW 3 @{ $row }\n";
my($col7, $col8, $col9, $col10) = @{ $row };
# column_7, column_10 define key
my $key = "${col7}${col10}";
#print "KEY 3 ${key}\n";
# column_7, column_8 from table 2
push(@{ $table{$key} }, $col7, $col8);
});
foreach my $key (@order) {
$csv->print(\*STDOUT, $table{$key});
}
exit 0;
试运行:
$ perl dummy.pl dummy1.txt dummy2.txt dummy3.txt
A 100 100001 X X 100 120000
B 100 99999 Y Y 100 66666
C 100 88888 Z Z 100 77777
D 99 100001 Y Y 99 100000
E 99 88888 Z Z 99 44444