Question

我有一个如下所示的数据集：

Comp1  Product  Comp2
A       P1      B
A       P2      B
A       P3      B
C       P4      D
C       P2      D
X       P1      Y
X       P2      Y
X       P3      Y

基本上，Comp1和Comp2是公司，Product是这些公司的共同产品名称。我希望输出显示如下：

Product Bundle    Count
P1,P2,P3          2
P2,P4             1

我是R的新手，并希望在这种情况下提供任何帮助。

Answer 1

使用dplyr，您可以汇总数据然后对其进行计数。例如

library(dplyr)

dd %>% arrange(Comp1, Product) %>% 
  group_by(Comp1) %>% 
  summarize(bundle=paste(unique(Product), collapse=",")) %>% 
  count(bundle)

#     bundle     n
#      <chr> <int>
# 1 P1,P2,P3     2
# 2    P2,P4     1

带有测试数据

dd <- read.table(text="Comp1  Product  Comp2
A       P1      B
A       P2      B
A       P3      B
C       P4      D
C       P2      D
X       P1      Y
X       P2      Y
X       P3      Y", header=TRUE)

Answer 2

data.table的解决方案：

library(data.table)
setDT(d)[order(Product), Prod.Bundle := toString(Product), by = Comp1
         ][, .(Count = uniqueN(Comp2)), by = Prod.Bundle]

或者@Frank在评论中提供的另一个：

setDT(d)[order(Product), toString(Product), by = Comp1
         ][, .(Count = .N), by = .(Prod.Bundle = V1)]

给出：

   Prod.Bundle Count
1:  P1, P2, P3     2
2:      P2, P4     1

使用过的数据：

d <- read.table(text="Comp1  Product  Comp2
A       P1      B
A       P2      B
A       P3      B
C       P4      D
C       P2      D
X       P1      Y
X       P2      Y
X       P3      Y", header=TRUE, stringsAsFactors=FALSE)

Answer 3

如果您喜欢像我一样使用base r，这是一个想法：

dtb <- table(paste(dd[[1]],dd[[3]]),dd[[2]])
out <- sapply(1:nrow(dtb),function(x) paste(colnames(dtb)[dtb[x,] == 
1],collapse = ","))
table(out)
out
P1,P2,P3    P2,P4 
       2        1

使用共享元素计数组

3 个答案: