Question

我有一个像这样排列的数据表：

ID       Category 1       Category 2     Category 3
Name 1   Example 1        Example 2      Example 3
Name 2   Example 1        Example 2      Example 4
Name 3   Example 5        Example 6      Example 4
....    ....             ....            .....

我正试图把它变成这样的表：

        Name 1     Name 2     Name 3   ....
 Name 1    0        2          0    
 Name 2    2        0          1
 Name 3    0        1          0
  ....

输出表中的每个单元格表示在ID之间进行比较时有多少类别相同。这也可能是有多少类别不同，任何一个都可以。我已经研究了堆栈溢出的邻接矩阵和社会矩阵，以及一些矩阵匹配建议，但我认为我的数据表没有正确设置。有没有人对如何做到这一点有任何建议？

编辑：啊，道歉。我正在使用R作为我的程序。离开那一点

Answer 1

您可以先将数据放入长格式，这样就可以做到非常简单：

# your data
tdf <- data.frame(ID = paste0("Name ", 1:3), cat1 = paste0("Example ", c(1,1,5)),
                  cat2 = paste0("Example ", c(2,2,6)),
                  cat3= paste0("Example ", c(3,4,4)))
tdf
#>       ID      cat1      cat2      cat3
#> 1 Name 1 Example 1 Example 2 Example 3
#> 2 Name 2 Example 1 Example 2 Example 4
#> 3 Name 3 Example 5 Example 6 Example 4

# the categories are extraneous, what matters is the relationship of ID to 
# the Example values, so we melt the df to long format using the 
# melt function from the package reshape2
lfd <- reshape2::melt(tdf, id.vars = "ID")
#> Warning: attributes are not identical across measure variables; they will
#> be dropped

# create an affiliation matrix
adj1 <- as.matrix(table(lfd$ID, lfd$value))
adj1
#>         
#>          Example 1 Example 2 Example 3 Example 4 Example 5 Example 6
#>   Name 1         1         1         1         0         0         0
#>   Name 2         1         1         0         1         0         0
#>   Name 3         0         0         0         1         1         1

# Adjacency matrix is simply the product
id_id_adj_mat <- adj1 %*% t(adj1)
# Set the diagonal to zero (currently diagonal displays degree of each node)
diag(id_id_adj_mat) <- 0
id_id_adj_mat
#>         
#>          Name 1 Name 2 Name 3
#>   Name 1      0      2      0
#>   Name 2      2      0      1
#>   Name 3      0      1      0

原始数据表到邻接矩阵/ Sociomatrix

1 个答案: