Question

我有以下形式的大量图表数据。假设一个人有多种兴趣。

person,interest
1,1
1,2
1,3
2,1
2,5
2,2
3,2
3,5
...

我想为每个用户构建所有兴趣对。我想将其转换为如下所示的边缘列表。我想要这种格式的数据，以便我可以将其转换为邻接矩阵以进行图形化等。

person,x_interest,y_interest
1,1,2
1,1,3
1,2,3
2,1,5
2,1,2
2,5,2
3,2,5

此处有一个解决方案：Pairs of Observations within Groups但它仅适用于小型数据集，因为对table的调用要生成超过2 ^ 31个元素。还有另一种方法，我可以做到这一点，而不必依赖table？

Answer 1

我们可以使用data.table。我们将'data.frame'转换为'data.table'（setDT(df1)，按'人'分组，我们得到'{1}}成对组合'interest'来创建两列（'x_interest'和'y_interest'）。

unique

注意：要speed up library(data.table) setDT(df1)[,{tmp <- combn(unique(interest),2) list(x_interest=tmp[c(TRUE, FALSE)], y_interest= tmp[c(FALSE, TRUE)])} , by = person]，可以使用combnPrim中的library(gRbase)代替combn。

数据

df1 <- structure(list(person = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), 
interest = c(1L, 
2L, 3L, 1L, 5L, 2L, 2L, 5L)), .Names = c("person", "interest"
), class = "data.frame", row.names = c(NA, -8L))

在群体内构建所有可能的对

1 个答案:

数据