我一直在疯狂地做一些基本的事情......
我正在尝试计算并在逗号分隔列中列出数据框中出现的每个唯一ID,例如:
df<-data.frame(id = as.character(c("a", "a", "a", "b", "c", "d", "d", "e", "f")), x1=c(3,1,1,1,4,2,3,3,3),
x2=c(6,1,1,1,3,2,3,3,1),
x3=c(1,1,1,1,1,2,3,3,2))
> > df
id x1 x2 x3
1 a 3 6 1
2 a 1 1 1
3 a 1 1 1
4 b 1 1 1
5 c 4 3 1
6 d 1 2 2
7 d 3 3 3
8 e 1 3 3
9 f 3 1 2
我正在尝试获取满足条件的唯一ID计数,&gt; 1:
res = data.frame(x1_counts =5, x1_names="a,c,d,e,f", x2_counts = 4, x2_names="a,c,d,f", x3_counts = 3, x3_names="d,e,f")
> res
x1_counts x1_names x2_counts x2_names x3_counts x3_names
1 5 a,c,d,e,f 4 a,c,d,f 3 d,e,f
我尝试过data.table但似乎很复杂,即 DT = as.data.table(df) res&lt; -DT [,list(x1 = length(unique(id [which(x1&gt; 1)])),x2 = length(unique(id [which(x2&gt; 1)]))),by = id)
但我无法做到正确,我不会得到我需要做的data.table,因为它不是真正的我正在寻找的分组。你能指引我走正确的道路吗?非常感谢!
答案 0 :(得分:2)
您可以将数据重新整形为长格式,然后执行摘要:
library(data.table)
(melt(setDT(df), id.vars = "id")[value > 1]
[, .(counts = uniqueN(id), names = list(unique(id))), variable])
# You can replace the list to toString if you want a string as name instead of list
# variable counts names
#1: x1 5 a,c,d,e,f
#2: x2 4 a,c,d,e
#3: x3 3 d,e,f
为了得到你需要的东西,重新塑造成宽幅格式:
dcast(1~variable,
data = (melt(setDT(df), id.vars = "id")[value > 1]
[, .(counts = uniqueN(id), names = list(unique(id))), variable]),
value.var = c('counts', 'names'))
# . counts_x1 counts_x2 counts_x3 names_x1 names_x2 names_x3
# 1: . 5 4 3 a,c,d,e,f a,c,d,e d,e,f