我有这个输入文件
1.00 3 4
93.00 2 3
105.00 0 2
119.00 0 2
122.00 1 4
202.00 1 3
207.00 1 2
210.00 1 4
236.00 0 1
237.00 0 4
237.00 0 2
240.00 1 3
243.00 2 3
243.00 3 4
243.00 0 3
275.00 0 4
275.00 2 4
353.00 0 3
361.00 1 4
411.00 0 1
412.00 1 3
425.00 0 3
426.00 0 4
455.00 1 4
464.00 0 3
520.00 0 4
560.00 1 3
561.00 1 4
581.00 0 2
我希望它像输出一样
并计算此信息
field1 field2 nbrepeated time1 time2 time3 time4
3 4 1 1.00 243.0 0 0
2 3 1 93.0 243.0 0 0
0 2 2 93.00 119.00 237.00 581.00
: : : : : : :
: : : : : : :
: : : : : : :
<field1> <field2> <nbrepeated> <time1> <time2> <time3> <time4> are columns
答案 0 :(得分:0)
使用Bash关联数组的shell脚本可以非常轻松地完成此任务:
#!/bin/bash
declare -A times
#create an associative array containing the times
#for each combination of field1,field2
while read line
do
time=$(echo $line | cut -d' ' -f1)
key=$(echo $line | cut -d' ' -f2,3)
times["$key"]="${times[$key]} $time"
done < data.txt
#print header
echo "field1 field2 nbrepeated time1 time2 time3 time4"
#iterate over the associative array and print
for key in "${!times[@]}"
do
data=($(echo ${times[$key]}))
reps=$((${#data[@]}-1))
#if there are fewer than 4 time entries, add zeros
while [ ${#data[@]} -lt 4 ]
do
data[${#data[@]}]=0
done
echo "$key $reps ${data[@]}"
done
<强>输出强>
field1 field2 nbrepeated time1 time2 time3 time4
1 3 3 202.00 240.00 412.00 560.00
1 2 0 207.00 0 0 0
1 4 4 122.00 210.00 361.00 455.00 561.00
2 3 1 93.00 243.00 0 0
2 4 0 275.00 0 0 0
0 1 1 236.00 411.00 0 0
0 2 3 105.00 119.00 237.00 581.00
0 3 3 243.00 353.00 425.00 464.00
0 4 3 237.00 275.00 426.00 520.00
3 4 1 1.00 243.00 0 0
答案 1 :(得分:0)
perl版本:
use strict;
use warnings;
my %data;
while (my $line = <DATA>) {
chomp($line);
my @row = split(/\s/, $line);
my $key = $row[1] . $row[2];
push @{$data{$key}}, $row[0];
}
my $max = 0;
for my $key (keys %data) {
if (scalar @{$data{$key}} > $max) {
$max = scalar @{$data{$key}};
}
}
{
my @times;
push @times, "time" . $_ for (1 .. $max);
myFormat("field1", "field2", "nbrepeated", @times);
}
for my $key (keys %data) {
my ($f1, $f2) = split (//, $key);
my $nr = $#{$data{$key}};
my @times = @{$data{$key}};
for (my $i = 0; $i < $max; $i++) {
if (! defined $times[$i] ) {
$times[$i] = 0;
}
}
myFormat($f1, $f2, $nr, @times);
}
sub myFormat {
printf "%-8s %-8s %-12s %-8s ", shift, shift, shift, shift;
for my $line (@_) {
printf "%-8s ", $line;
}
print "\n";
}
__DATA__
1.00 3 4
93.00 2 3
105.00 0 2
119.00 0 2
122.00 1 4
202.00 1 3
207.00 1 2
210.00 1 4
236.00 0 1
237.00 0 4
237.00 0 2
240.00 1 3
243.00 2 3
243.00 3 4
243.00 0 3
275.00 0 4
275.00 2 4
353.00 0 3
361.00 1 4
411.00 0 1
412.00 1 3
425.00 0 3
426.00 0 4
455.00 1 4
464.00 0 3
520.00 0 4
560.00 1 3
561.00 1 4
581.00 0 2
产生输出:
field1 field2 nbrepeated time1 time2 time3 time4 time5
0 1 1 236.00 411.00 0 0 0
0 4 3 237.00 275.00 426.00 520.00 0
1 2 0 207.00 0 0 0 0
1 4 4 122.00 210.00 361.00 455.00 561.00
0 2 3 105.00 119.00 237.00 581.00 0
3 4 1 1.00 243.00 0 0 0
0 3 3 243.00 353.00 425.00 464.00 0
2 4 0 275.00 0 0 0 0
2 3 1 93.00 243.00 0 0 0
1 3 3 202.00 240.00 412.00 560.00 0
输出未排序。如果你指定你想要它的排序方式,那么排序就没问题了。
答案 2 :(得分:0)
bash版
declare -A t
while read tm f1 f2; do
t["$f1:$f2"]+=" $tm"
done < times.txt
max=0
for key in "${!t[@]}"; do
set -- ${t[$key]}
[[ $# -gt $max ]] && max=$#
done
{
printf "field1 field2 nbrepeated"
for i in $(seq $max); do printf " %s" time$i; done
echo ' "avg"'
for key in "${!t[@]}"; do
f1=${key%:*}
f2=${key#*:}
set -- ${t[$key]}
printf "%d %d %d" $f1 $f2 $(($# - 1))
for i in $(seq $max); do
printf " %.1f" ${1-0}
shift
done
# calculate average
set -- ${t[$key]}
n=$(( $# - 1 ))
if [[ $n -eq 0 ]]; then
avg=$1
else
prev=$1
shift
total="0"
while [[ $# -gt 0 ]]; do
total="$total + ($1 - $prev)"
prev=$1
shift
done
avg=$( echo "scale=1; ($total)/$n" | bc )
fi
printf " %.1f\n" $avg
done
} | column -t
产生此输出
field1 field2 nbrepeated time1 time2 time3 time4 time5 "avg"
2 4 0 275.0 0.0 0.0 0.0 0.0 275.0
2 3 1 93.0 243.0 0.0 0.0 0.0 150.0
1 3 3 202.0 240.0 412.0 560.0 0.0 119.3
1 2 0 207.0 0.0 0.0 0.0 0.0 207.0
0 4 3 237.0 275.0 426.0 520.0 0.0 94.3
0 2 3 105.0 119.0 237.0 581.0 0.0 158.6
0 3 3 243.0 353.0 425.0 464.0 0.0 73.6
0 1 1 236.0 411.0 0.0 0.0 0.0 175.0
1 4 4 122.0 210.0 361.0 455.0 561.0 109.7
3 4 1 1.0 243.0 0.0 0.0 0.0 242.0