我有两个标签分隔的文件,我需要将它们对齐在一起。例如:
File 1: File 2:
AAA 123 BBB 345
BBB 345 CCC 333
CCC 333 DDD 444
(这些是大文件,可能有数千行!)
我想要做的是让输出看起来像这样:
AAA 123
BBB 345 BBB 345
CCC 333 CCC 333
DDD 444
最好我想在perl中这样做,但不知道怎么做。任何帮助都会有很大的帮助。
答案 0 :(得分:1)
如果只是制作数据结构,这可能非常简单。
#!/usr/bin/env perl
# usage: script.pl file1 file2 ...
use strict;
use warnings;
my %data;
while (<>) {
chomp;
my ($key, $value) = split;
push @{$data{$key}}, $value;
}
use Data::Dumper;
print Dumper \%data;
然后您可以以您喜欢的任何格式输出。如果它真的准确地使用文件,那么它有点棘手。
答案 1 :(得分:0)
假设文件已排序,
sub get {
my ($fh) = @_;
my $line = <$fh>;
return () if !defined($line);
return split(' ', $line);
}
my ($key1, $val1) = get($fh1);
my ($key2, $val2) = get($fh2);
while (defined($key1) && defined($key2)) {
if ($key1 lt $key2) {
print(join("\t", $key1, $val1), "\n");
($key1, $val1) = get($fh1);
}
elsif ($key1 gt $key2) {
print(join("\t", '', '', $key2, $val2), "\n");
($key2, $val2) = get($fh2);
}
else {
print(join("\t", $key1, $val1, $key2, $val2), "\n");
($key1, $val1) = get($fh1);
($key2, $val2) = get($fh2);
}
}
while (defined($key1)) {
print(join("\t", $key1, $val1), "\n");
($key1, $val1) = get($fh1);
}
while (defined($key2)) {
print(join("\t", '', '', $key1, $val1), "\n");
($key2, $val2) = get($fh2);
}
答案 2 :(得分:0)
正如池上所说,它假设文件的内容按照你的例子所示排列。
use strict;
use warnings;
open my $file1, '<file1.txt' or die $!;
open my $file2, '<file2.txt' or die $!;
my $file1_line = <$file1>;
print $file1_line;
while ( my $file2_line = <$file2> ) {
if( defined( $file1_line = <$file1> ) ) {
chomp $file1_line;
print $file1_line;
}
my $tabs = $file1_line ? "\t" : "\t\t";
print "$tabs$file2_line";
}
close $file1;
close $file2;
回顾一下您的示例,您会在两个文件中显示一些相同的键/值对。鉴于此,看起来您希望显示文件1唯一的对,文件2唯一,并显示公共对。如果是这种情况(并且您没有尝试通过键或值匹配文件的对),您可以use List::Compare:
use strict;
use warnings;
use List::Compare;
open my $file1, '<file1.txt' or die $!;
my @file1 = <$file1>;
close $file1;
open my $file2, '<file2.txt' or die $!;
my @file2 = <$file2>;
close $file2;
my $lc = List::Compare->new(\@file1, \@file2);
my @file1Only = $lc->get_Lonly; # L(eft array)only
for(@file1Only) { print }
my @bothFiles = $lc->get_intersection;
for(@bothFiles) { chomp; print "$_\t$_\n" }
my @file2Only = $lc->get_Ronly; # R(ight array)only
for(@file2Only) { print "\t\t$_" }
答案 3 :(得分:0)
与Joel Berger的答案类似,但这种方法可以让您跟踪文件是否包含给定密钥:
my %data;
while (my $line = <>){
chomp $line;
my ($k) = $line =~ /^(\S+)/;
$data{$k}{line} = $line;
$data{$k}{$ARGV} = 1;
}
use Data::Dumper;
print Dumper(\%data);
输出:
$VAR1 = {
'CCC' => {
'other.dat' => 1,
'data.dat' => 1,
'line' => 'CCC 333'
},
'BBB' => {
'other.dat' => 1,
'data.dat' => 1,
'line' => 'BBB 345'
},
'DDD' => {
'other.dat' => 1,
'line' => 'DDD 444'
},
'AAA' => {
'data.dat' => 1,
'line' => 'AAA 123'
}
};