我是Perl的新手,目前,我正在使用Perl进行一些文本处理。输入文件中有四列,由制表符分隔。我想找到第3列的最小值和第4列的最大值,并将它们放在一行中以获得相同的ID。下面显示了输入文件的外观:
A A1 1 5
A A1 9 18
A A1 23 40
A A2 20 30
A A2 35 43
B A1 2 10
B A1 12 30
B A1 35 100
C A9 2 40
C A9 45 70
我想要的输出:
A A1 1 40
A A2 23 43
B A1 2 100
C A9 2 70
答案 0 :(得分:2)
来自命令行的Perl,
perl -anE'
$k = join "\t", @F[0,1];
$h{$k} or push @r, $k;
(!defined or $_ >$F[2]) and $_ = $F[2] for $h{$k}{m};
($_ <$F[3]) and $_ = $F[3] for $h{$k}{M};
}{
say join "\t", $_, @{$h{$_}}{qw(m M)} for @r
' file
输出
A A1 1 40
A A2 20 43
B A1 2 100
C A9 2 70
答案 1 :(得分:0)
这样的东西?
use strict;
use warnings;
open my $fh, '<', 'input-data.txt';
# Keep track of the current minimum and maximum
# values while we read the file.
#
my (%val1_min, %val2_max);
while (<$fh>) ## loop through lines of file
{
chomp; ## remove trailing "\n" character
# Split on sequences of whitespace
#
my ($key1, $key2, $val1, $val2) = split /\s+/;
# Record a new minimum if there is no old
# minimum, or if the old minimum is higher
# than the current value.
#
$val1_min{$key1}{$key2} = $val1
if !defined($val1_min{$key1}{$key2})
or $val1_min{$key1}{$key2} > $val1;
# Record a new maximum if there is no old
# maximum, or if the old maximum is lower
# than the current value.
#
$val2_max{$key1}{$key2} = $val2
if !defined($val2_max{$key1}{$key2})
or $val2_max{$key1}{$key2} < $val2;
}
# Now we need to produce some output.
#
# Loop through the first level of keys.
#
for my $key1 (sort keys %val1_min)
{
# Loop through the second level of keys.
#
for my $key2 (sort keys %{$val1_min{$key1}})
{
# Print a line of output to STDOUT.
#
printf(
"%-04s %-04s %3d %3d\n", ## formatting string
$key1, ## first key
$key2, ## second key
$val1_min{$key1}{$key2}, ## minimum first value
$val2_max{$key1}{$key2}, ## maximum second value
);
}
}
答案 2 :(得分:0)
使用命令行perl:
perl -MList::Util=max,min -lane '
$k = join "\t", splice @F, 0, 2;
push @k, $k if !$v{$k};
push @{$v{$k}[$_]}, $F[$_] for (0..$#F);
}{
print join "\t", $_, min(@{$v{$_}[0]}), max(@{$v{$_}[1]}) for @k;
' file.txt
输出:
A A1 1 40
A A2 20 43
B A1 2 100
C A9 2 70
答案 3 :(得分:0)
逐行读取数据文件,使用前两列的组合作为记录散列的键,并在该散列中重新列入最小列3和最大列4。如果你想保持这些键的顺序,也可以将它们推送到数组。
#!/usr/bin/perl
use strict;
use warnings;
use feature qw(switch say);
use Data::Dumper;
my (%record, @key);
while (<>) {
chomp;
my @field = split /\s+/;
my $key = join "\t", @field[0,1];
push @key, $key unless $record{$key};
if (!$record{$key}{min} || $record{$key}{min} > $field[2]) {
$record{$key}{min} = $field[2];
}
if (!$record{$key}{max} || $record{$key}{max} < $field[3]) {
$record{$key}{max} = $field[3];
}
}
for my $key (@key) {
print (join "\t", $key, $record{$key}{min}, $record{$key}{max}, "\n");
}