如何获取数据框的特定列中的项目频率

时间:2015-12-12 13:55:46

标签: r

我有一个数据框mydf。我想得到组合列中每个项目的计数,以获得如下所示的结果。

   mydf <-structure(c("AMLM12001KP", "AMLM120XP", "AMLM12001KP", "1231401", 
            "1231401", "1231401", "ANKRD30BL*", "WDR70*NXPH1", "WDR70*NXPH1", 
            "FGGY*", "LIN28A*DFNB59", "AK2*"), .Dim = c(6L, 2L), .Dimnames = list(
                NULL, c("customer_sample_id", "combination")))

结果

combination      frequency    customer_sample_id
ANKRD30BL*       1 sample     AMLM12001KP 
WDR70*NXPH1      2 sample     AMLM120XP, AMLM12001KP
FGGY*            1 sample     1231401
LIN28A*DFNB59    1 sample     1231401
AK2*             1 sample     1231401 

1 个答案:

答案 0 :(得分:1)

以基地R:

aggregate(customer_sample_id ~ combination, data = mydf,
          FUN = function(x) c(length(x), paste(x, collapse = ",")))

library(data.table)
mydt <- as.data.table(mydf)
mydt[, .(freq = .N, customer_sample_id = paste(customer_sample_id, collapse = ",")), by = combination]

library(dplyr)
data.frame(mydf) %>% 
  group_by(combination) %>% 
  summarise(freq = n(), customer_sample_id = paste(customer_sample_id, collapse = ","))