使用条件分离和计算列表中的元素数

时间:2013-12-17 12:16:13

标签: shell for-loop awk while-loop

我想分开并计算输入列表中的元素数量。 input.txt包含2列,$ 1是元素ID,$ 2是它的比率(数字)。

ENSG001 12.3107448237
ENSG007 4.3602275
ENSG008 2.9918420285
ENSG009 1.035588
ENSG010 0.999864
ENSG012 0.569833
ENSG013 0.495325
ENSG014 0.253893
ENSG015 0.125389
ENSG017 0.012568
ENSG018 -0.135689
ENSG020 -0.4938497942
ENSG022 -0.6429221854
ENSG024 -1.1759339381
ENSG029 -4.2722999766
ENSG030 -11.8447513281

我想将比率分为以下几类:

Greater than or equal to 2
Between 1 and 2
Between 0.5 and 1
Between -0.5 and 0.5
Between -1 and -0.5
Between -2 and -1
Less than or equal to 2

然后将每个类别的计数打印到一个单独的输出文件results.txt:

Total   16
 > 2    3
 1 to 2  1
 0.5 to 1    2
-0.5 to 0.5  6
-0.5 to -1   1
-1 to -2     1
 < -2    2

我可以使用以下命令在命令行执行此操作:

awk $2 > 2 {print $1,$2} input.txt | wc -l
awk $2 > 0.5 && $2 < 1 {print $1,$2} input.txt | wc -l
awk $2 > -0.5 && $2 < 0.5 {print $1,$2} input.txt | wc -l
awk $2 > -0.5 && $2 < -1 {print $1,$2} input.txt | wc -l
awk $2 > -1 && $2 < -0.5 {print $1,$2} input.txt | wc -l
awk $2 > -2 && $2 < -1 {print $1,$2} input.txt | wc -l
awk $2 < -2 {print $1,$2} input.txt | wc -l

我认为使用带有while或for循环的shell脚本有一种更快捷的方法,但我不知道如何操作。任何建议都会很棒。

5 个答案:

答案 0 :(得分:3)

你可以只处理一次文件,直截了当的方法是:

awk '$2>=2{a++;next}
$2>0.5 && $2 <1 {b++;next}
$2>-0.5 && $2 <0.5 {c++;next}
...
$2<=-2{x++;next}
END{print "total:",NR;
    print ">2:",a;
    print "1-2:",b;
    ...
    print "<-2:",x
}' file

答案 1 :(得分:2)

您可以使用sort以数字方式对条目进行排序,然后计算每个区间中的条目数。例如,考虑您的输入:

cut -f 2 -d ' ' input.txt | sort -nr | awk '
    BEGIN { split("2 1 0.5 -0.5 -1 -2", inter); i = 1; }
    {
        if (i > 6) { ++c; next; }
        if ($1 >= inter[i]) ++c;
        else if (i == 1) { print c, "greater than", inter[i++]; c = 1; }
        else { print c, "between", inter[i - 1], "and", inter[i++]; c = 1; }
    }
    END { print c, "lower than", inter[i - 1]; }'

如果您的输入已经排序,您甚至可以使用以下命令缩短命令行:

awk 'BEGIN { split("2 1 0.5 -0.5 -1 -2", inter); i = 1; }
{
    if (i > 6) { ++c; next; }
    if ($2 >= inter[i]) ++c;
    else if (i == 1) { print c, "greater than", inter[i++]; c = 1; }
    else { print c, "between", inter[i - 1], "and", inter[i++]; c = 1; }
}
END { print c, "lower than", inter[i - 1]; }' input.txt

结果输出 - 您可以按照以下格式进行格式化:

3 greater than 2
1 between 2 and 1
2 between 1 and 0.5
6 between 0.5 and -0.5
1 between -0.5 and -1
1 between -1 and -2
2 lower than -2

答案 2 :(得分:1)

一种方法是使用单个awk命令通过维护您感兴趣的每个类别的运行计数来实现此目的。

#!/bin/bash

if [ $# -ne 1 ]
then
    echo "Usage: $0 INPUT"
    exit 1
fi

awk ' {
  if      ($2 >  2)   count[0]++
  else if ($2 >  1)   count[1]++
  else if ($2 >  0.5) count[2]++
  else if ($2 > -0.5) count[3]++
  else if ($2 > -1)   count[4]++
  else if ($2 > -2)   count[5]++
  else count[6]++
} END {
  print "      >  2\t",   count[0]
  print " 1   to  2\t",   count[1]
  print " 0.5 to  1\t",   count[2]
  print "-0.5 to  0.5\t", count[3]
  print "-1   to -0.5\t", count[4]
  print "-2   to -1\t",   count[5]
  print "      < -2\t",   count[6]
}' $1

答案 3 :(得分:1)

awk -f script.awk input.txt

script.awk

{
    if ($2>=2) counter1++
    else if ($2>=1) counter2++
    else if ($2>=0.5) counter3++
    else if ($2>=-0.5) counter4++
    else if ($2>=-1) counter5++
    else if ($2>=-2) counter6++
    else counter7++
}
END{
    print "Greater than 2: "counter1
    print "Between 1 and 2: "counter2
    print "Between 0.5 and 1: "counter3
    print "Between -0.5 and 0.5: "counter4
    print "Between -1 and -0.5: "counter5
    print "Between -2 and -1: "counter6
    print "Less than 2: "counter7
}

答案 4 :(得分:1)

脚本托托:

awk '
      $2>2                  { count[1]++; label[1]="Greater than or equal to 2"; }
     ($2>1    && $2<=2)     { count[2]++; label[2]="Between 1 and 2"; }
     ($2>0.5  && $2<=1)     { count[3]++; label[3]="Between 0.5 and 1"; }
     ($2>-0.5 && $2<=0.5)   { count[4]++; label[4]="Between -0.5 and 0.5"; }
     ($2>-1   && $2<=-0.5)  { count[5]++; label[5]="Between -1 and -0.5"; }
     ($2>-2   && $2<=-1)    { count[6]++; label[6]="Between -2 and -1"; }
                 $2<=-2     { count[7]++; label[7]="Less than or equal to 2"; }

    END { for (i=1;i<=7;i++)
           {   printf "%-30s %s\n" ,label[i], count[i];
           }
        }
    '  /tmp/input.txt

结果:

. /tmp/toto

Greater than or equal to 2     3
Between 1 and 2                1
Between 0.5 and 1              2
Between -0.5 and 0.5           6
Between -1 and -0.5            1
Between -2 and -1              1
Less than or equal to 2        2