Shell或AWK脚本使用第一个字段对重复条目进行分组,并使用最后一个字段查找差异

时间:2013-01-29 19:01:24

标签: bash shell

我想编写一个脚本(shell脚本或awk)来打印$ 1(第一个字段)的重复条目all to gather,然后使用最后一个值来查找last和first entry之间的差异,或者可以注意值的差异在每个重复的条目。

例如,我的文件有以下条目:

counter1 is 100
counter2 is 200
counter3 is 300
counter1 is 1000
counter2 is 2000
counter3 is 3000
counter1 is 10000
counter2 is 20000
counter3 is 30000

我想打印:

counter1 is 100
counter1 is 1000
counter1 is 10000
counter2 is 200
counter2 is 2000
counter2 is 20000
counter3 is 300
counter3 is 3000
counter3 is 30000

现在每个计数器都有一些递增值,所以我想找到同一个计数器的每个值之间的差异:

counter1 is 100
counter1 is 1000 | difference 1000-100 = 900
counter1 is 10000| difference 10000-100= 9900

我能够打印重复的条目但不能将它们捆绑在一起,它以与文件相同的顺序出现。

MacBook-Air:linuxscripts jimdev$ awk 'NR==FNR && a[$1]++ {b[$1];next} $1 in b' FS=" " countr.txt countr.txt 

counter1 is 100
counter2 is 200
counter3 is 300
counter1 is 1000
counter2 is 2000
counter3 is 3000
counter1 is 10000
counter2 is 20000
counter3 is 30000

2 个答案:

答案 0 :(得分:2)

这对你有用吗?

sort countr.txt | grep -v '^$'  | awk '
BEGIN { field1="different" ; firstval="0" ; }
     $1 !~ field1 { print $0 ; field1 = $1 ; firstval = $NF ; continue;}
     $1  ~ field1 { print $0 " | difference " $NF "-" firstval " = " $NF-firstval ; }'

,这是输入文件的输出,如帖子所示:

counter1 is 100
counter1 is 1000 | difference 1000-100 = 900
counter1 is 10000 | difference 10000-100 = 9900
counter2 is 200
counter2 is 2000 | difference 2000-200 = 1800
counter2 is 20000 | difference 20000-200 = 19800
counter3 is 300
counter3 is 3000 | difference 3000-300 = 2700
counter3 is 30000 | difference 30000-300 = 29700

答案 1 :(得分:1)

假设您的数据位于名为data.txt的文件中。 如果(或使用模式),你可以使用sort和awk简单地获取它:

sort data.txt | awk 'BEGIN{last = ""; value = 0;} {if ($1 == last) {print $1" is "$3" | difference "$3"-"value" = "($3-value)}else{last = $1; value = $3; print $1" is "$3;}}' -

说明:首先将输入排序为按升序排列“计数器”。然后我们使用AWK表达式:

  1. 我们使用2个时间变量:last,存储当前计数器,以及第一个计数器的值。我们在AWK脚本的BEGIN部分初始化它:BEGIN{last = ""; value = 0;}
  2. 现在,对于每一行,我们执行以下代码:

    if ($1 == last) {
        print $1" is "$3" | difference "$3"-"value" = "($3-value);
    } else {
        last = $1;
        value = $3;
        print $1" is "$3;
    }
    

    第1行:比较第一个字段(计数器)和last,它存储最后一个计数器标记,以便知道我们是否应该打印差异。

    第2行:如果当前行具有与前一行相同的计数器标记,则打印差异。

    第3行:否则,这是一个基本情况,所以我们保存当前的计数器标签,以便与下一行进行比较,计算差值的值,我们打印该行。

    1. 如果新行与前一行具有相同的计数器标记,我们会保留这些值并计算该计数器第一个值的差值。另外,我们存储新的计数器标签(在最后一个变量中)及其值(在值中),我们只打印该行。
    2. 以下是输入样本的输出:

      counter1 is 100
      counter1 is 1000 | difference 1000-100 = 900
      counter1 is 10000 | difference 10000-100 = 9900
      counter2 is 200
      counter2 is 2000 | difference 2000-200 = 1800
      counter2 is 20000 | difference 20000-200 = 19800
      counter3 is 300
      counter3 is 3000 | difference 3000-300 = 2700
      counter3 is 30000 | difference 30000-300 = 29700