R计算并列出满足条件的每列的唯一行

时间:2016-09-20 23:49:53

标签: r sum data.table

我一直在疯狂地做一些基本的事情......

我正在尝试计算并在逗号分隔列中列出数据框中出现的每个唯一ID,例如:

df<-data.frame(id = as.character(c("a", "a", "a", "b", "c", "d", "d", "e", "f")), x1=c(3,1,1,1,4,2,3,3,3),
x2=c(6,1,1,1,3,2,3,3,1),
x3=c(1,1,1,1,1,2,3,3,2))

> > df  
  id x1 x2 x3
1  a  3  6  1
2  a  1  1  1
3  a  1  1  1
4  b  1  1  1
5  c  4  3  1
6  d  1  2  2
7  d  3  3  3
8  e  1  3  3
9  f  3  1  2

我正在尝试获取满足条件的唯一ID计数,&gt; 1:

res = data.frame(x1_counts =5, x1_names="a,c,d,e,f", x2_counts = 4, x2_names="a,c,d,f", x3_counts = 3, x3_names="d,e,f")

> res
  x1_counts  x1_names x2_counts x2_names x3_counts x3_names
1         5 a,c,d,e,f         4  a,c,d,f         3    d,e,f

我尝试过data.table但似乎很复杂,即     DT = as.data.table(df)     res&lt; -DT [,list(x1 = length(unique(id [which(x1&gt; 1)])),x2 = length(unique(id [which(x2&gt; 1)]))),by = id)

但我无法做到正确,我不会得到我需要做的data.table,因为它不是真正的我正在寻找的分组。你能指引我走正确的道路吗?非常感谢!

1 个答案:

答案 0 :(得分:2)

您可以将数据重新整形为长格式,然后执行摘要:

library(data.table)
(melt(setDT(df), id.vars = "id")[value > 1]
   [, .(counts = uniqueN(id), names = list(unique(id))), variable])
   # You can replace the list to toString if you want a string as name instead of list

#   variable counts     names
#1:       x1      5 a,c,d,e,f
#2:       x2      4   a,c,d,e
#3:       x3      3     d,e,f

为了得到你需要的东西,重新塑造成宽幅格式:

dcast(1~variable, 
      data = (melt(setDT(df), id.vars = "id")[value > 1]
                 [, .(counts = uniqueN(id), names = list(unique(id))), variable]),  
      value.var = c('counts', 'names'))

#    . counts_x1 counts_x2 counts_x3  names_x1 names_x2 names_x3
# 1: .         5         4         3 a,c,d,e,f  a,c,d,e    d,e,f