我想得到：

Question

我有一个像这样的data.frame：

x1 <- data.frame(id=1:3,item=c("A","B","A","B","C","D"))
x1[order(x1$item),]
  id item
1  1    A
3  3    A
2  2    B
4  1    B
5  2    C
6  3    D

我想得到：

id1=c(1,2,1,3,2,3)
id2 = c(2,1,3,1,3,2)
A=c(0,0,1,1,0,0)
B=c(1,1,0,0,0,0)
C = 0
D=0
datawanted <- data.frame(id1,id2,A,B,C,D)
  id1 id2 A B C D
1   1   2 0 1 0 0
2   2   1 0 1 0 0
3   1   3 1 0 0 0
4   3   1 1 0 0 0
5   2   3 0 0 0 0
6   3   2 0 0 0 0

如果person1和person2都有B，那么在 datawanted 数据框中，A列得到1，否则得0。

有人可以在R中给我一些建议或功能来处理这个问题吗？

Answer 1

很酷的问题。你有一个二分图，所以关注Gabor's tutorial ...

library(igraph)
g = graph_from_edgelist(as.matrix(x1))
V(g)$type = grepl("[A-Z]", V(g)$name)

对于OP的期望输出，首先我们可以提取关联矩阵：

gi = get.incidence(g)
#   A B C D
# 1 1 1 0 0
# 2 0 1 1 0
# 3 1 0 0 1

注意（感谢@thelatemail），如果您不想使用igraph，可以gi作为table(x1)。

然后，我们来看看id的组合：

res = t(combn(nrow(gi), 2, function(x) c(
    as.integer(rownames(gi)[x]), 
    pmin( gi[x[1], ], gi[x[2], ] ) 
)))

dimnames(res) <- list( NULL, c("id1", "id2", colnames(gi)))
#      id1 id2 A B C D
# [1,]   1   2 0 1 0 0
# [2,]   1   3 1 0 0 0
# [3,]   2   3 0 0 0 0

这基本上是OP的理想输出。它们包括冗余行（例如1,2和2,1）。

使用图表（ht Chris）的有趣理由：

V(g)$color <- ifelse(V(g)$type, "red", "light blue")
V(g)$x     <- (1:2)[ V(g)$type + 1 ]
V(g)$y     <- ave(seq_along(V(g)), V(g)$type, FUN = seq_along)
plot(g)

或者，显然这可以或多或少地完成

plot(g, layout = layout.bipartite(g)[,2:1])

变换ID - ＆gt;物品到{ids}对 - ＆gt;项目

我有一个像这样的data.frame：

我想得到：

1 个答案: