我有一个像这样排列的数据表:
ID Category 1 Category 2 Category 3
Name 1 Example 1 Example 2 Example 3
Name 2 Example 1 Example 2 Example 4
Name 3 Example 5 Example 6 Example 4
.... .... .... .....
我正试图把它变成这样的表:
Name 1 Name 2 Name 3 ....
Name 1 0 2 0
Name 2 2 0 1
Name 3 0 1 0
....
输出表中的每个单元格表示在ID之间进行比较时有多少类别相同。这也可能是有多少类别不同,任何一个都可以。我已经研究了堆栈溢出的邻接矩阵和社会矩阵,以及一些矩阵匹配建议,但我认为我的数据表没有正确设置。有没有人对如何做到这一点有任何建议?
编辑:啊,道歉。我正在使用R作为我的程序。离开那一点答案 0 :(得分:0)
您可以先将数据放入长格式,这样就可以做到非常简单:
# your data
tdf <- data.frame(ID = paste0("Name ", 1:3), cat1 = paste0("Example ", c(1,1,5)),
cat2 = paste0("Example ", c(2,2,6)),
cat3= paste0("Example ", c(3,4,4)))
tdf
#> ID cat1 cat2 cat3
#> 1 Name 1 Example 1 Example 2 Example 3
#> 2 Name 2 Example 1 Example 2 Example 4
#> 3 Name 3 Example 5 Example 6 Example 4
# the categories are extraneous, what matters is the relationship of ID to
# the Example values, so we melt the df to long format using the
# melt function from the package reshape2
lfd <- reshape2::melt(tdf, id.vars = "ID")
#> Warning: attributes are not identical across measure variables; they will
#> be dropped
# create an affiliation matrix
adj1 <- as.matrix(table(lfd$ID, lfd$value))
adj1
#>
#> Example 1 Example 2 Example 3 Example 4 Example 5 Example 6
#> Name 1 1 1 1 0 0 0
#> Name 2 1 1 0 1 0 0
#> Name 3 0 0 0 1 1 1
# Adjacency matrix is simply the product
id_id_adj_mat <- adj1 %*% t(adj1)
# Set the diagonal to zero (currently diagonal displays degree of each node)
diag(id_id_adj_mat) <- 0
id_id_adj_mat
#>
#> Name 1 Name 2 Name 3
#> Name 1 0 2 0
#> Name 2 2 0 1
#> Name 3 0 1 0