标题列的平均列数

时间:2016-03-17 18:44:46

标签: linux awk average multiple-columns

我有像这样的列的文件。以下示例输入是部分的。

请在下方查看主要文件链接。每个文件只有两行。

Gene    0.4%    0.7%    1.1%    1.4%    1.8%    2.2%    2.5%    2.9%    3.3%    3.6%    4.0%    4.3%    4.7%    5.1%    5.4%    5.8%    6.2%    6.5%    6.9%    7.2%    7.6%    8.0%    8.3%    8.7%    9.1%    9.4%    9.8%    10.1%   10.5%   10.9%   11.2%   11.6%   12.0%   12.3%   12.7%   13.0%   13.4%   13.8%   14.1%   14.5%   14.9%   15.2%   15.6%   15.9%   16.3%   16.7%   17.0%   17.4%   17.8%   18.1%   18.5%   18.8%   19.2%   19.6%   19.9%   20.3%   20.7%   21.0%   21.4%   21.7%   22.1%   22.5%   22.8%   23.2%   23.6%   23.9%   24.3%   24.6%   25.0%   25.4%   25.7%   26.1%   26.4%   26.8%   27.2%   27.5%   27.9%   28.3%   28.6%   29.0%   29.3%   29.7%   30.1%   30.4%   30.8%   31.2%   31.5%   31.9%   32.2%   32.6%   33.0%   33.3%   33.7%   34.1%   34.4%   34.8%   35.1%   35.5%   35.9%   36.2%   36.6%   37.0%   37.3%   37.7%   38.0%   38.4%   38.8%   39.1%   39.5%   39.9%   40.2%   40.6%   40.9%   41.3%   41.7%   42.0%   42.4%   42.8%   43.1%   43.5%   43.8%   44.2%   44.6%   44.9%   45.3%   45.7%   46.0%   46.4%   46.7%   47.1%   47.5%   47.8%   48.2%   48.6%   48.9%   49.3%   49.6%   50.0%   50.4%   50.7%   51.1%   51.4%   51.8%   52.2%   52.5%   52.9%   53.3%   53.6%   54.0%   54.3%   54.7%   55.1%   55.4%   55.8%   56.2%   56.5%   56.9%   57.2%   57.6%   58.0%   58.3%   58.7%   59.1%   59.4%   59.8%   60.1%   60.5%   60.9%   61.2%   61.6%   62.0%   62.3%   62.7%   63.0%   63.4%   63.8%   64.1%   64.5%   64.9%   65.2%   65.6%   65.9%   66.3%   66.7%   67.0%   67.4%   67.8%   68.1%   68.5%   68.8%   69.2%   69.6%   69.9%   70.3%   70.7%   71.0%   71.4%   71.7%   72.1%   72.5%   72.8%   73.2%   73.6%   73.9%   74.3%   74.6%   75.0%   75.4%   75.7%   76.1%   76.4%   76.8%   77.2%   77.5%   77.9%   78.3%   78.6%   79.0%   79.3%   79.7%   80.1%   80.4%   80.8%   81.2%   81.5%   81.9%   82.2%   82.6%   83.0%   83.3%   83.7%   84.1%   84.4%   84.8%   85.1%   85.5%   85.9%   86.2%   86.6%   87.0%   87.3%   87.7%   88.0%   88.4%   88.8%   89.1%   89.5%   89.9%   90.2%   90.6%   90.9%   91.3%   91.7%   92.0%   92.4%   92.8%   93.1%   93.5%   93.8%   94.2%   94.6%   94.9%   95.3%   95.7%   96.0%   96.4%   96.7%   97.1%   97.5%   97.8%   98.2%   98.6%   98.9%   99.3%   99.6%   100.0%  0.4%    0.7%    1.1%    1.4%    1.8%    2.2%    2.5%    2.9%    3.3%    3.6%    4.0%    4.3%    4.7%    5.1%    5.4%    5.8%    6.2%    6.5%    6.9%    7.2%    7.6%    8.0%    8.3%    8.7%    9.1%    9.4%    9.8%    10.1%   10.5%   10.9%   11.2%   11.6%   12.0%   12.3%   12.7%   13.0%   13.4%   13.8%   14.1%   14.5%   14.9%   15.2%   15.6%   15.9%   16.3%   16.7%   17.0%   17.4%   17.8%   18.1%   18.5%   18.8%   19.2%   19.6%   19.9%   20.3%   20.7%   21.0%   21.4%   21.7%   22.1%   22.5%   22.8%   23.2%   23.6%   23.9%   24.3%   24.6%   25.0%   25.4%   25.7%   26.1%   26.4%   26.8%   27.2%   27.5%   27.9%   28.3%   28.6%   29.0%   29.3%   29.7%   30.1%   30.4%   30.8%   31.2%   31.5%   31.9%   32.2%   32.6%   33.0%   33.3%   33.7%   34.1%   34.4%   34.8%   35.1%   35.5%   35.9%   36.2%   36.6%   37.0%   37.3%   37.7%   38.0%   38.4%   38.8%   39.1%   39.5%   39.9%   40.2%   40.6%   40.9%   41.3%   41.7%   42.0%   42.4%   42.8%   43.1%   43.5%   43.8%   44.2%   44.6%   44.9%   45.3%   45.7%   46.0%   46.4%   46.7%   47.1%   47.5%   47.8%   48.2%   48.6%   48.9%   49.3%   49.6%   50.0%   50.4%   50.7%   51.1%   51.4%   51.8%   52.2%   52.5%   52.9%   53.3%   53.6%   54.0%   54.3%   54.7%   55.1%   55.4%   55.8%   56.2%   56.5%   56.9%   57.2%   57.6%   58.0%   58.3%   58.7%   59.1%   59.4%   59.8%   60.1%   60.5%   60.9%   61.2%   61.6%   62.0%   62.3%   62.7%   63.0%   63.4%   63.8%   64.1%   64.5%   64.9%   65.2%   65.6%   65.9%   66.3%   66.7%   67.0%   67.4%   67.8%   68.1%   68.5%   68.8%   69.2%   69.6%   69.9%   70.3%   70.7%   71.0%   71.4%   71.7%   72.1%   72.5%   72.8%   73.2%   73.6%   73.9%   74.3%   74.6%   75.0%   75.4%   75.7%   76.1%   76.4%   76.8%   77.2%   77.5%   77.9%   78.3%   78.6%   79.0%   79.3%   79.7%   80.1%   80.4%   80.8%   81.2%   81.5%   81.9%   82.2%   82.6%   83.0%   83.3%   83.7%   84.1%   84.4%   84.8%   85.1%   85.5%   85.9%   86.2%   86.6%   87.0%   87.3%   87.7%   88.0%   88.4%   88.8%   89.1%   89.5%   89.9%   90.2%   90.6%   90.9%   91.3%   91.7%   92.0%   92.4%   92.8%   93.1%   93.5%   93.8%   94.2%   94.6%   94.9%   95.3%   95.7%   96.0%   96.4%   96.7%   97.1%   97.5%   97.8%   98.2%   98.6%   98.9%   99.3%   99.6%   100.0%

基本上,这就是我需要做的事情。

一个。从第二列开始,这里是0.4%。

湾一直走到" 10"在标题名称中。如果标题名称恰好是10.0%,那么也包括该列。如果没有,只包括在它之前的列。在这个例子中,由于我们有10.1%(第29列),我们将包括从0.4%(秒)到9.8%(第28列)的列。如果第29列是10.0%,那么它也会包含在内。

℃。平均第二行中这些相应列的值(此处未显示数据 - 请单击此链接以获取总数据集 - https://goo.gl/W8jND7)。在此示例中,从0.4%(第二列)开始直到9.8%(第28列)。

d。在输出中,打印第一列是" Gene",这个平均值,列标题是

Gene Average_10%

即然后从10.1%(第29列)开始检查,直到你点击" 20"在标题名称中。重复步骤b到d。并打印输出为

Gene Average_10% Average_20%

重复此操作直至

Gene Average_10% Average_20% Average_30% Average_40% Average_50% Average_60% Average_70% Average_80% Average_90% Average_100%

F。达到100%后,意味着完成了一个数据集。

克。如果你仔细观察我的列标题,那么在前100%之后还有另外0.4%-100%的列。我将在上述链接的输入文件中包含这些0.4%-100%s中的13个。

我。我有多个文件,标题可以是

1% 2% 3%....100%
1.5% 2.5% 3.5%....100%

因文件而异。但是平均逻辑(如果你击中" 10"," 20"等)总是一样的。并且样本13的数量也相同,这意味着每个文件将具有100%s,持续13次。

1 个答案:

答案 0 :(得分:0)

我应该说,这是一项可怕的格式。我不希望任何人为你提出最终解决方案,但这就是我接近这个的方法

awk 'NR == 1 {
    gsub("%","");
    for (f=2; f<=NF; f++) {
      for (i=1; i<10; i++) 
          if ($f<10*i && $(f+1)>=10*i) print f, $f
      if ($f==100) print f, $f   
    }}' file

28 9.8
56 19.9
83 29.7
111 39.9
138 49.6
166 59.8
194 69.9
221 79.7
249 89.9
277 100.0
304 9.8
332 19.9
359 29.7
387 39.9
414 49.6
442 59.8
470 69.9
497 79.7
525 89.9
553 100.0

此处打印列索引和用于验证目的的阈值。一旦提取了列边界,就可以直接对各列进行求和。请注意,根据你的逻辑,100%应该永远不会被包括在内,但它似乎是错误的,所以我有特殊情况。