我有一个data.frame:
df <- data.frame(id = rep(1:4, each = 3),x = c("A","B","C","D","E","A","A","C","D","A","C","E"))
我想计算每个id内的连接数: 这是我想要的输出:
connections |num. of connections
A - B | 1
B - C | 1
C - D | 1
A - C | 3
A - E | 2
A - D | 2
D - E | 1
C - E | 1
如何在dplyr中执行此操作?
答案 0 :(得分:6)
听起来你只是在寻找crossprod
函数,你可以像这样使用它:
crossprod(table(df))
# x
# x A B C D E
# A 4 1 3 2 2
# B 1 1 1 0 0
# C 3 1 3 1 1
# D 2 0 1 2 1
# E 2 0 1 1 2
这会让您更接近您想要的输出:
library(reshape2)
X <- crossprod(table(df))
X[upper.tri(X, diag = TRUE)] <- NA
melt(X, na.rm = TRUE)
# x x value
# 2 B A 1
# 3 C A 3
# 4 D A 2
# 5 E A 2
# 8 C B 1
# 9 D B 0
# 10 E B 0
# 14 D C 1
# 15 E C 1
# 20 E D 1
答案 1 :(得分:3)
使用dplyr
和combn
library(dplyr)
df %>%
group_by(id) %>%
mutate(connections=c(combn(as.character(x),2,
FUN=function(x) paste(sort(x), collapse=" - ")))) %>%
group_by(connections) %>%
summarise(numConn=n())
# connections numConn
#1 A - B 1
#2 A - C 3
#3 A - D 2
#4 A - E 2
#5 B - C 1
#6 C - D 1
#7 C - E 1
#8 D - E 1
与data.table
library(data.table)
setDT(df)[,combn(as.character(x),2, FUN= function(x)
paste(sort(x), collapse=" - ")) , by=id][
,list(numConn=.N), by=list(connections=V1)]
# connections numConn
#1: A - B 1
#2: A - C 3
#3: B - C 1
#4: D - E 1
#5: A - D 2
#6: A - E 2
#7: C - D 1
#8: C - E 1