unix文件中任务的列中的最小值和最大值?

时间:2016-11-23 13:40:03

标签: shell awk

我有一个文件,其中包含第一列中的任务名称以及完成第二列中任务所需的时间,如下所示:

Task2, 3421
Task3, 3300
Task1, 1000
Task2, 1100
Task3, 1200
Task3, 1209
Task4, 1299
Task3, 1289
Task1, 1389
Task2, 1211
Task5, 1216
Task2, 1416
Task1, 2100
Task6, 2416
Task5, 2216
Task7, 1116

现在我必须以下面的格式找到每项任务和输出的最短和最长时间

task , maxtime , min time 

e.g。

Task1, 1000, 2100 ( from the data given above)

7 个答案:

答案 0 :(得分:4)

您可以尝试使用awk

awk '
    BEGIN{FS=","; OFS=", "}
    !($1 in max) || $2>max[$1]{max[$1]=$2}
    !($1 in min) || $2<min[$1]{min[$1]=$2}
    END{
        for(k in max){print k, min[k], max[k]}
    }' input.txt

你明白了,

Task1, 1000, 2100
Task2, 1100, 3421
Task3, 1200, 3300
Task4, 1299, 1299
Task5, 1216, 2216
Task6, 2416, 2416
Task7, 1116, 1116

答案 1 :(得分:1)

另一种方法是按列1排序,然后按列2排序,并为每个任务选择第一个和最后一个值

awk -F, '{arr[$1]=arr[$1] $2} END {for(key in arr) print key, arr[key]}' <(sort -t 1 -k 1,2 file) | awk '{OFS=", "; print $1, $2, $NF}'

示例运行:

$ cat file 
Task2, 3421
Task3, 3300
Task1, 1000
Task2, 1100
Task3, 1200
Task3, 1209
Task4, 1299
Task3, 1289
Task1, 1389
Task2, 1211
Task5, 1216
Task2, 1416
Task1, 2100
Task6, 2416
Task5, 2216
Task7, 1116
$ sort -t 1 -k 1,2 file
Task1, 1000
Task1, 1389
Task1, 2100
Task2, 1100
Task2, 1211
Task2, 1416
Task2, 3421
Task3, 1200
Task3, 1209
Task3, 1289
Task3, 3300
Task4, 1299
Task5, 1216
Task5, 2216
Task6, 2416
Task7, 1116
$ awk -F, '{arr[$1]=arr[$1] $2} END {for(key in arr) print key, arr[key]}' <(sort -t 1 -k 1,2 file) | awk '{OFS=", "; print $1, $2, $NF}'
Task1, 1000, 2100
Task2, 1100, 3421
Task3, 1200, 3300
Task4, 1299, 1299
Task5, 1216, 2216
Task6, 2416, 2416
Task7, 1116, 1116

答案 2 :(得分:1)

使用gawk的{​​{3}}:

gawk 'BEGIN{OFS=FS=","}
      $2>a[$1]["max"]{a[$1]["max"]=$2}
      $2<a[$1]["min"] || !a[$1]["min"] {a[$1]["min"]=$2}
      END {for (i in a){
             print i, a[i]["min"],a[i]["max"]
             }
      }' file

示例array of arrays

答案 3 :(得分:1)

这是另一种选择

$ join -t, <(sort file){,} | sort -k1,1 -k2n -k3nr | rev | uniq -2 | rev

答案 4 :(得分:1)

使用sortsedawk

的另一个答案
sort -k1,1 -k2n input.txt | sed -r ':a;N;$!ba;:b;s/(Task[0-9]+, )([0-9 ,]+)\n?\1([0-9]+)/\1\2, \3/g;tb;' | awk 'BEGIN{FS=OFS=", ";}{print $1, $2, $NF}'

仅使用sortsed的替代解决方案

sort -k1,1 -k2n input.txt | sed -r ':a;N;$!ba;:b;s/(Task[0-9]+, )([0-9 ,]+)\n?\1([0-9]+)/\1\2, \3/g;tb;' | sed -r -e 's/^([^ ]+)\s([^ ]+)\s.*\s([^ ]+)/\1 \2 \3/' -e 's/^([^ ]+)\s([^ ]+)$/\1 \2, \2/'

你明白了,

Task1, 1000, 2100
Task2, 1100, 3421
Task3, 1200, 3300
Task4, 1299, 1299
Task5, 1216, 2216
Task6, 2416, 2416
Task7, 1116, 1116

答案 5 :(得分:0)

sort在第一列和第二列,然后awk它。这个解决方案中的好处(awk部分)是它不会将数据存储在内存中并最终将其转储出来,而是在找到新的数据后输出先前$1的数据。在这里:

$ sort -t, -k1 foo -k2n | \                     # sort 
awk '!($1 in min)    {min[$1]=$2}               # first of each is always min (and max)
      ($1 in min)    {max[$1]=$2}               # every current one is always max
       $1!=p && NR>1 {print p, min[p], max[p]}  # if $1 differs from previous, print previous
                     {p=$1}                     # p is current for next round
       END           {print p, min[p], max[p]}' # dump buffer
Task1, 1000 2100
Task2, 1100 3421
Task3, 1200 3300
Task4, 1299 1299
Task5, 1216 2216
Task6, 2416 2416
Task7, 1116 1116

答案 6 :(得分:0)

这主要是bash,如果你遇到一些问题,我可以用其他东西替换awk命令......(例如colrm如果时间总是在同一列中开始的话。)

# Keep a list of already processed task names
already_processed=""

# Use read to read only the first column from the data file
while IFS=',' read -ra task; do
  # If the task has already been processed, skip it and go to the next line
  if echo "$already_processed" | grep $task > /dev/null; then
    continue
  else
    # Select all the task with the same name from the data file, take the 
    #+second column and sort it to find the max and the minimum.
    MIN=`grep $task $1 | awk -F',' '{print $2}' | sort -n | head -1`
    MAX=`grep $task $1 | awk -F',' '{print $2}' | sort -n | tail -1`
    # Add the task to the "already_processed" tasks (to be sure each task will 
    #+appear only once in the output
    already_processed="$already_processed:$task"
    # Print the output in the wanted format.
    echo "${task}, ${MIN}, ${MAX}"
  fi

done < $1

请确保您的数据文件以空行结束。

示例:

bash <name_of_script_file> <name_of_data_file> | sort    
Task1, 1000, 2100
Task2, 1100, 3421
Task3, 1200, 3300
Task4, 1299, 1299
Task5, 1216, 2216
Task6, 2416, 2416
Task7, 1116, 1116