Question

我一直在疯狂地做一些基本的事情......

我正在尝试计算并在逗号分隔列中列出数据框中出现的每个唯一ID，例如：

df<-data.frame(id = as.character(c("a", "a", "a", "b", "c", "d", "d", "e", "f")), x1=c(3,1,1,1,4,2,3,3,3),
x2=c(6,1,1,1,3,2,3,3,1),
x3=c(1,1,1,1,1,2,3,3,2))

> > df  
  id x1 x2 x3
1  a  3  6  1
2  a  1  1  1
3  a  1  1  1
4  b  1  1  1
5  c  4  3  1
6  d  1  2  2
7  d  3  3  3
8  e  1  3  3
9  f  3  1  2

我正在尝试获取满足条件的唯一ID计数，＆gt; 1：

res = data.frame(x1_counts =5, x1_names="a,c,d,e,f", x2_counts = 4, x2_names="a,c,d,f", x3_counts = 3, x3_names="d,e,f")

> res
  x1_counts  x1_names x2_counts x2_names x3_counts x3_names
1         5 a,c,d,e,f         4  a,c,d,f         3    d,e,f

我尝试过data.table但似乎很复杂，即 DT = as.data.table（df） res＆lt; -DT [，list（x1 = length（unique（id [which（x1＆gt; 1）]）），x2 = length（unique（id [which（x2＆gt; 1）]））），by = id）

但我无法做到正确，我不会得到我需要做的data.table，因为它不是真正的我正在寻找的分组。你能指引我走正确的道路吗？非常感谢！

Answer 1

您可以将数据重新整形为长格式，然后执行摘要：

library(data.table)
(melt(setDT(df), id.vars = "id")[value > 1]
   [, .(counts = uniqueN(id), names = list(unique(id))), variable])
   # You can replace the list to toString if you want a string as name instead of list

#   variable counts     names
#1:       x1      5 a,c,d,e,f
#2:       x2      4   a,c,d,e
#3:       x3      3     d,e,f

为了得到你需要的东西，重新塑造成宽幅格式：

dcast(1~variable, 
      data = (melt(setDT(df), id.vars = "id")[value > 1]
                 [, .(counts = uniqueN(id), names = list(unique(id))), variable]),  
      value.var = c('counts', 'names'))

#    . counts_x1 counts_x2 counts_x3  names_x1 names_x2 names_x3
# 1: .         5         4         3 a,c,d,e,f  a,c,d,e    d,e,f

R计算并列出满足条件的每列的唯一行

1 个答案: