从长形式的值重建对称矩阵

时间:2015-06-28 01:58:54

标签: r matrix reshape

我有一个看起来像这样的长片(长篇):

  one   two   value
  a     b     30
  a     c     40
  a     d     20
  b     c     10
  b     d     05
  c     d     30

我正在尝试将其转换为R(或pandas)的数据框

    a  b  c  d 
a   00 30 40 20
b   30 00 10 05 
c   40 10 00 30
d   20 05 30 00

问题是,在我的tsv中我只有a,b定义而不是b,a。所以我的数据框中有很多NA。

最终目标是获得用于聚类的距离矩阵。任何帮助将不胜感激。

5 个答案:

答案 0 :(得分:7)

igraph解决方案,您在其中读取数据框,并将值假定为边权重。然后,您可以将其转换为邻接矩阵

dat <- read.table(header=T, text=" one   two   value
  a     b     30
  a     c     40
  a     d     20
  b     c     10
  b     d     05
  c     d     30")

library(igraph)

# Make undirected so that graph matrix will be symmetric
g <- graph.data.frame(dat, directed=FALSE)

# add value as a weight attribute
get.adjacency(g, attr="value", sparse=FALSE)
#   a  b  c  d
#a  0 30 40 20
#b 30  0 10  5
#c 40 10  0 30
#d 20  5 30  0

答案 1 :(得分:2)

另一种方法是reshape::cast

df.long = data.frame(one=c('a','a','a','b','b','c'),
                     two=c('b','c','d','c','d','d'),
                     value=c(30,40,20,10,05,30) )

# cast will recover the upper/lower-triangles...
df <- as.matrix( cast(df.long, one ~ two, fill=0) )
#    b  c  d
# a 30 40 20
# b  0 10  5
# c  0  0 30

因此我们构造具有完整索引的矩阵,并插入:

df <- matrix(nrow=length(indices), ncol=length(indices),dimnames = list(indices,indices))    
diag(df) <- 0
# once we assure that the full upper-triangle is present and in sorted order (as Robert's answer does), then we
df[upper.tri(df)] <- as.matrix( cast(df.long, one ~ two, fill=0) )
df[lower.tri(df)] <- df[upper.tri(df)]

更新:原始草图包括这些手册kludges

然后用相同的方法添加缺失的行'd'和列'a',并通过添加转置t(df)填充下三角:

df <- cbind(a=rep(0,4), rbind(df, d=rep(0,3)))
#   a  b  c  d
# a 0 30 40 20
# b 0  0 10  5
# c 0  0  0 30
# d 0  0  0  0

df + t(df)
#    a  b  c  d
# a  0 30 40 20
# b 30  0 10  5
# c 40 10  0 30
# d 20  5 30  0

答案 2 :(得分:1)

确保您的数据已排序// Create the module, maybe in a separate place. angular.module('lazyLoad', []); // Attach controllers to that module: angular.module('lazyLoad') // HEY! SEE THIS? NO BRACKETS. .controller('ItemListController', ...);] angular.module('lazyLoad') // HEY! SEE THIS? NO BRACKETS. .controller('InvoiceController' ...); ,并尝试以下操作:

tsv=tsv[with(tsv,order(one,two)),]

答案 3 :(得分:1)

您可以尝试

 un1 <- unique(unlist(df1[1:2]))
 df1[1:2] <- lapply(df1[1:2], factor, levels=un1)
 m1 <- xtabs(value~one+two, df1)
 m1+t(m1)
 #    two
 #one  a  b  c  d
 #a    0 30 40 20
 #b   30  0 10  5
 #c   40 10  0 30
 #d   20  5 30  0

或者您使用row/col索引

  m1 <- matrix(0, nrow=length(un1), ncol=length(un1),
                              dimnames=list(un1, un1))
  m1[cbind(match(df1$one, rownames(m1)), 
               match(df1$two, colnames(m1)))] <- df1$value
  m1+t(m1)
  #   a  b  c  d
  #a  0 30 40 20
  #b 30  0 10  5
  #c 40 10  0 30
  #d 20  5 30  0

答案 4 :(得分:0)

这是针对不想学习新功能的用户的基本R解决方案。它创建一个对称矩阵。

df.long = data.frame(one=c('a','a','a','b','b','c'),
                     two=c('b','c','d','c','d','d'),
                     value=c(30,40,20,10,05,30) )

v <- unique(c(df.long$one, df.long$two))
mx <- sapply(v, function(x) {
    sapply(v, function(y) {
        df.long[df.long$one %in% c(x, y) & df.long$two %in% c(x, y), "value"]
    })
})
diag(mx) <- 0
  a  b  c  d 
a 0  30 40 20
b 30 0  10 5 
c 40 10 0  30
d 20 5  30 0