awk,按行模式重新组合

时间:2011-05-31 09:06:50

标签: perl awk

我有这个输入文件

     1.00 3 4
     93.00 2 3
     105.00 0 2
     119.00 0 2
     122.00 1 4
     202.00 1 3
     207.00 1 2
     210.00 1 4
     236.00 0 1
     237.00 0 4
     237.00 0 2
     240.00 1 3
     243.00 2 3
     243.00 3 4
     243.00 0 3
     275.00 0 4
     275.00 2 4
     353.00 0 3
     361.00 1 4
     411.00 0 1
     412.00 1 3
     425.00 0 3
     426.00 0 4
     455.00 1 4
     464.00 0 3
     520.00 0 4
     560.00 1 3
     561.00 1 4
     581.00 0 2

我希望它像输出一样

并计算此信息

    field1 field2 nbrepeated time1 time2  time3   time4
    3      4      1          1.00  243.0  0       0
    2      3      1          93.0  243.0  0       0
    0      2      2          93.00 119.00 237.00  581.00 
    :      :      :          :     :      :       :
    :      :      :          :     :      :       :
    :      :      :          :     :      :       :




    <field1> <field2> <nbrepeated> <time1> <time2> <time3> <time4> are columns

3 个答案:

答案 0 :(得分:0)

使用Bash关联数组的shell脚本可以非常轻松地完成此任务:

#!/bin/bash

declare -A times

#create an associative array containing the times
#for each combination of field1,field2
while read line
do
  time=$(echo $line | cut -d' ' -f1)
  key=$(echo $line | cut -d' ' -f2,3)
  times["$key"]="${times[$key]} $time"
done < data.txt

#print header
echo "field1 field2 nbrepeated time1 time2 time3 time4"

#iterate over the associative array and print
for key in "${!times[@]}"
do
    data=($(echo ${times[$key]}))
    reps=$((${#data[@]}-1))
    #if there are fewer than 4 time entries, add zeros
    while [ ${#data[@]} -lt 4 ]
    do
        data[${#data[@]}]=0
    done

    echo "$key $reps ${data[@]}"
done

<强>输出

field1 field2 nbrepeated time1 time2 time3 time4
1 3 3 202.00 240.00 412.00 560.00
1 2 0 207.00 0 0 0
1 4 4 122.00 210.00 361.00 455.00 561.00
2 3 1 93.00 243.00 0 0
2 4 0 275.00 0 0 0
0 1 1 236.00 411.00 0 0
0 2 3 105.00 119.00 237.00 581.00
0 3 3 243.00 353.00 425.00 464.00
0 4 3 237.00 275.00 426.00 520.00
3 4 1 1.00 243.00 0 0

答案 1 :(得分:0)

perl版本:

use strict;
use warnings;

my %data;
while (my $line = <DATA>) {
    chomp($line);
    my @row = split(/\s/, $line);
    my $key = $row[1] . $row[2];
    push @{$data{$key}}, $row[0];
}

my $max = 0;
for my $key (keys %data) {
    if (scalar @{$data{$key}} > $max) {
        $max = scalar @{$data{$key}};
    }
}

{
    my @times;
    push @times, "time" . $_ for (1 .. $max);
    myFormat("field1", "field2", "nbrepeated", @times);
}

for my $key (keys %data) {
    my ($f1, $f2) = split (//, $key);
    my $nr = $#{$data{$key}};
    my @times = @{$data{$key}};
    for (my $i = 0; $i < $max; $i++) {
        if (! defined $times[$i] ) {
            $times[$i] = 0;
        }
    }
    myFormat($f1, $f2, $nr, @times);
}

sub myFormat {
    printf "%-8s %-8s %-12s %-8s ", shift, shift, shift, shift;
    for my $line (@_) {
        printf "%-8s ", $line;
    }
    print "\n";
}

__DATA__
1.00 3 4
93.00 2 3
105.00 0 2
119.00 0 2
122.00 1 4
202.00 1 3
207.00 1 2
210.00 1 4
236.00 0 1
237.00 0 4
237.00 0 2
240.00 1 3
243.00 2 3
243.00 3 4
243.00 0 3
275.00 0 4
275.00 2 4
353.00 0 3
361.00 1 4
411.00 0 1
412.00 1 3
425.00 0 3
426.00 0 4
455.00 1 4
464.00 0 3
520.00 0 4
560.00 1 3
561.00 1 4
581.00 0 2

产生输出:

field1   field2   nbrepeated   time1    time2    time3    time4    time5
0        1        1            236.00   411.00   0        0        0
0        4        3            237.00   275.00   426.00   520.00   0
1        2        0            207.00   0        0        0        0
1        4        4            122.00   210.00   361.00   455.00   561.00
0        2        3            105.00   119.00   237.00   581.00   0
3        4        1            1.00     243.00   0        0        0
0        3        3            243.00   353.00   425.00   464.00   0
2        4        0            275.00   0        0        0        0
2        3        1            93.00    243.00   0        0        0
1        3        3            202.00   240.00   412.00   560.00   0

输出未排序。如果你指定你想要它的排序方式,那么排序就没问题了。

答案 2 :(得分:0)

bash版

declare -A t

while read tm f1 f2; do
    t["$f1:$f2"]+=" $tm"
done < times.txt

max=0
for key in "${!t[@]}"; do
    set -- ${t[$key]}
    [[ $# -gt $max ]] && max=$#
done

{
    printf "field1 field2 nbrepeated"
    for i in $(seq $max); do printf " %s" time$i; done
    echo ' "avg"'

    for key in "${!t[@]}"; do
        f1=${key%:*}
        f2=${key#*:}
        set -- ${t[$key]}
        printf "%d %d %d" $f1 $f2 $(($# - 1))
        for i in $(seq $max); do
            printf " %.1f" ${1-0}
            shift
        done

        # calculate average
        set -- ${t[$key]}
        n=$(( $# - 1 ))
        if [[ $n -eq 0 ]]; then
            avg=$1
        else
            prev=$1
            shift
            total="0"
            while [[ $# -gt 0 ]]; do
                total="$total + ($1 - $prev)"
                prev=$1
                shift
            done
            avg=$( echo "scale=1; ($total)/$n" | bc )
        fi
        printf " %.1f\n" $avg
    done
} | column -t

产生此输出

field1  field2  nbrepeated  time1  time2  time3  time4  time5  "avg"
2       4       0           275.0  0.0    0.0    0.0    0.0    275.0
2       3       1           93.0   243.0  0.0    0.0    0.0    150.0
1       3       3           202.0  240.0  412.0  560.0  0.0    119.3
1       2       0           207.0  0.0    0.0    0.0    0.0    207.0
0       4       3           237.0  275.0  426.0  520.0  0.0    94.3
0       2       3           105.0  119.0  237.0  581.0  0.0    158.6
0       3       3           243.0  353.0  425.0  464.0  0.0    73.6
0       1       1           236.0  411.0  0.0    0.0    0.0    175.0
1       4       4           122.0  210.0  361.0  455.0  561.0  109.7
3       4       1           1.0    243.0  0.0    0.0    0.0    242.0