计算data.frame dplyr中的连接数

时间:2014-12-16 09:33:11

标签: r dplyr

我有一个data.frame:

 df <- data.frame(id = rep(1:4, each = 3),x = c("A","B","C","D","E","A","A","C","D","A","C","E"))

我想计算每个id内的连接数: 这是我想要的输出:

connections    |num. of connections
   A - B       | 1
   B - C       | 1
   C - D       | 1
   A - C       | 3
   A - E       | 2
   A - D       | 2
   D - E       | 1
   C - E       | 1

如何在dplyr中执行此操作?

2 个答案:

答案 0 :(得分:6)

听起来你只是在寻找crossprod函数,你可以像这样使用它:

crossprod(table(df))
#    x
# x   A B C D E
#   A 4 1 3 2 2
#   B 1 1 1 0 0
#   C 3 1 3 1 1
#   D 2 0 1 2 1
#   E 2 0 1 1 2

这会让您更接近您想要的输出:

library(reshape2)
X <- crossprod(table(df))
X[upper.tri(X, diag = TRUE)] <- NA
melt(X, na.rm = TRUE)
#    x x value
# 2  B A     1
# 3  C A     3
# 4  D A     2
# 5  E A     2
# 8  C B     1
# 9  D B     0
# 10 E B     0
# 14 D C     1
# 15 E C     1
# 20 E D     1

答案 1 :(得分:3)

使用dplyrcombn

library(dplyr)
df %>% 
   group_by(id) %>% 
   mutate(connections=c(combn(as.character(x),2,
        FUN=function(x) paste(sort(x), collapse=" - ")))) %>% 
   group_by(connections) %>% 
   summarise(numConn=n())
 #   connections numConn
 #1       A - B       1
 #2       A - C       3
 #3       A - D       2
 #4       A - E       2
 #5       B - C       1
 #6       C - D       1
 #7       C - E       1
 #8       D - E       1

data.table

采用相同的方法
library(data.table)
setDT(df)[,combn(as.character(x),2, FUN= function(x)
           paste(sort(x), collapse=" - ")) , by=id][
                    ,list(numConn=.N), by=list(connections=V1)]
 #    connections numConn
#1:       A - B       1
#2:       A - C       3
#3:       B - C       1
#4:       D - E       1
#5:       A - D       2
#6:       A - E       2
#7:       C - D       1
#8:       C - E       1