Question

我正在尝试在R中创建一个独特用户的跨产品矩阵。我在SO上搜索它但找不到我想要的东西。任何帮助表示赞赏。我有一个大型数据框（超过一百万），并显示了一个示例：

   Products Users
1 Product a user1
2 Product b user1
3 Product a user2
4 Product c user1
5 Product b user2
6 Product c user3

df的输出是：

            Product a   Product b   Product c
Product a                 2            3
Product b     2                        3
Product c     3           3

我想看两个矩阵：第一个将显示具有任一产品（OR）的唯一用户的数量 - 因此输出将类似于：

            Product a   Product b   Product c
Product a                 2            1
Product b     2                        1
Product c     1           1

第二个矩阵将是具有两个产品的唯一用户数（AND）：

Id

感谢任何帮助。

由于

更新

以下更清晰：产品a由User1和User2使用。产品b由User1和User2使用，产品c由User1和User3使用。因此，在第一个矩阵中，产品a和产品b将是2，因为有2个唯一用户。类似地，产品a和产品c将是3.在第二个矩阵中，它们将是2和1，因为我想要交叉点。感谢

Answer 1

尝试

content

或使用lst <- split(df$Users, df$Products) ln <- length(lst) m1 <- matrix(0, ln,ln, dimnames=list(names(lst), names(lst))) m1[lower.tri(m1, diag=FALSE)] <- combn(seq_along(lst), 2, FUN= function(x) length(unique(unlist(lst[x])))) m1[upper.tri(m1)] <- m1[lower.tri(m1)] m1 # Product a Product b Product c #Product a 0 2 3 #Product b 2 0 3 #Product c 3 3 0

outer

对于第二种情况

f1 <- function(u, v) length(unique(unlist(c(lst[[u]], lst[[v]]))))
res <- outer(seq_along(lst), seq_along(lst), FUN= Vectorize(f1)) *!diag(3)
dimnames(res) <- rep(list(names(lst)),2)
res
#          Product a Product b Product c
#Product a         0         2         3
#Product b         2         0         3
#Product c         3         3         0

生成唯一用户项跨产品组合的矩阵

1 个答案: