我有一个数据框mydf
。我想得到组合列中每个项目的计数,以获得如下所示的结果。
mydf <-structure(c("AMLM12001KP", "AMLM120XP", "AMLM12001KP", "1231401",
"1231401", "1231401", "ANKRD30BL*", "WDR70*NXPH1", "WDR70*NXPH1",
"FGGY*", "LIN28A*DFNB59", "AK2*"), .Dim = c(6L, 2L), .Dimnames = list(
NULL, c("customer_sample_id", "combination")))
结果
combination frequency customer_sample_id
ANKRD30BL* 1 sample AMLM12001KP
WDR70*NXPH1 2 sample AMLM120XP, AMLM12001KP
FGGY* 1 sample 1231401
LIN28A*DFNB59 1 sample 1231401
AK2* 1 sample 1231401
答案 0 :(得分:1)
以基地R:
aggregate(customer_sample_id ~ combination, data = mydf,
FUN = function(x) c(length(x), paste(x, collapse = ",")))
library(data.table)
mydt <- as.data.table(mydf)
mydt[, .(freq = .N, customer_sample_id = paste(customer_sample_id, collapse = ",")), by = combination]
或dplyr:
library(dplyr)
data.frame(mydf) %>%
group_by(combination) %>%
summarise(freq = n(), customer_sample_id = paste(customer_sample_id, collapse = ","))