在awk中打​​印搜索模式

时间:2014-07-07 14:32:22

标签: linux bash awk

我想打印匹配的搜索模式,然后计算平均行。最好是一个例子:

输入文件:

chr17   41275978    41276294    BRCA1_ex02_01   278 
chr17   41275978    41276294    BRCA1_ex02_01   279 
chr17   41275978    41276294    BRCA1_ex02_01   280 
chr17   41275978    41276294    BRCA1_ex02_02   281 
chr17   41275978    41276294    BRCA1_ex02_02   282 
chr17   41275978    41276294    BRCA1_ex02_03   283 
chr17   41275978    41276294    BRCA1_ex02_03   284 
chr17   41275978    41276294    BRCA1_ex02_03   285 
chr17   41275978    41276294    BRCA1_ex02_04   286 
chr17   41275978    41276294    BRCA1_ex02_04   287 
chr17   41275978    41276294    BRCA1_ex02_04   288 

我在bash循环中的wana提取(例如)只是相同的第4列:

OUTPUT1:

chr17   41275978    41276294    BRCA1_ex02_01   278 
chr17   41275978    41276294    BRCA1_ex02_01   279 
chr17   41275978    41276294    BRCA1_ex02_01   280 

OUTPUT2:

chr17   41275978    41276294    BRCA1_ex02_02   281 
chr17   41275978    41276294    BRCA1_ex02_02   282 

OUTPUT3:

chr17   41275978    41276294    BRCA1_ex02_03   283 
chr17   41275978    41276294    BRCA1_ex02_03   284 
chr17   41275978    41276294    BRCA1_ex02_03   285 

等等......然后计算第5列的平均值非常容易:

awk'END {sum + = $ 5} {print NR / sum}'in_file.txt

在我的情况下,有数千行BRCA1_exXX_XX - 所以任何想法热点拆分它?

保罗。

2 个答案:

答案 0 :(得分:2)

我认为这会做你想要的。

awk '{
    # Keep running sum of fifth column based on value of fourth column.
    v[$4]+=$5;
    # Keep count of lines with similar fourth column values.
    n[$4]++
}

END {
    # Loop over all the values we saw and print out their fourth columns and the sum of the fifth columns.
    for (val in n) {
        print val ": " v[val] / n[val]
    }
}' $file

答案 1 :(得分:1)

假设条目按照给定数据按第4列排序,您可以这样做:

awk '

  $4 != prev {              # if this line's 4th column is different from the previous line
    if (cnt > 0)            # if count of lines is greater than 0
      print prev, sum / cnt #   print the average
    prev = $4               # save previous 4th column
    sum = $5                # initialize sum to column 5
    cnt = 1                 # initialize count to 1
    next                    # go to next line
  }

  {
    sum += $5               # accumulate total of 5th column
    ++cnt                   # increment count of lines
  }

  END {
    if (cnt > 0)             # if count > 0 (avoid divide by 0 on empty file)
      print prev, sum / cnt  #   print the average for the last line
  }

' file