使用r跨多个列的频率计数

时间:2018-07-23 17:14:49

标签: r count

我有一个数据框,格式为:

x <-
Chrom    sample1    sample2    sample3  ...
Contig12    0/0     0/0     0/1
Contig12    ./.     ./.     0/0
Contig28    0/0     0/0     0/0
Contig28    1/1     1/1     1/1
Contig55    0/0     0/0     0/1
Contig55    0/1     0/1     0/1
Contig61    ./.     0/1     1/1
.
.
.

有〜20000行和〜100个唯一列,我试图计算每个列(样本)中每个唯一状态发生的次数,以便得到:

         sample1    sample2     sample3     ...
./.      2          1           0
0/0      3          3           2
0/1      1          2           3
1/1      1          1           2

关于如何执行此操作的任何建议?我尝试使用plyr包中的count(),但无法弄清楚如何在每一列中使用它。

非常感谢您的帮助!

1 个答案:

答案 0 :(得分:2)

library(dplyr)
df %>% gather(key, value, -Chrom) %>% # gather turn dataset from wide to long format by collapse (collect) values in all columns 
                                      #except Chrom into two columns key and value. See ?gather for more info
       dplyr::select(-Chrom) %>%      #select all columns except Chrom i.e. key and value 
       table()                        # count the number of each unique pear

         value
 key       ./. 0/0 0/1 1/1
  sample1   2   3   1   1
  sample2   1   3   2   1
  sample3   0   2   3   2

数据

df <- read.table(text="
      Chrom    sample1    sample2    sample3
             Contig12    0/0     0/0     0/1
             Contig12    ./.     ./.     0/0
             Contig28    0/0     0/0     0/0
             Contig28    1/1     1/1     1/1
             Contig55    0/0     0/0     0/1
             Contig55    0/1     0/1     0/1
             Contig61    ./.     0/1     1/1
              ",header=T, stringsAsFactors = F)