Question

我有一个二维表，其中R中的data.frame中的距离（从csv导入）：

           CP000036   CP001063      CP001368
CP000036      0           a            b
CP001063      a           0            c
CP001368      b           c            0

我想“扁平”它。我在第一个col中有一个轴的值，在第二个col中有其他轴的值，然后是第三个col中的距离：

Genome1      Genome2       Dist
CP000036     CP001063       a
CP000036     CP001368       b
CP001063     CP001368       c

上面是理想的，但重复这样输入矩阵中的每个单元都有自己的行是完全没问题的：

Genome1      Genome2       Dist
CP000036     CP000036       0
CP000036     CP001063       a
CP000036     CP001368       b
CP001063     CP000036       a
CP001063     CP001063       0
CP001063     CP001368       c
CP001368     CP000036       b
CP001368     CP001063       c
CP001368     CP001368       0

这是一个示例3x3矩阵，但我的数据集I要大得多（大约2000x2000）。我会在Excel中执行此操作，但输出需要约300万行，而Excel的最大值为~100万行。

这个问题非常相似 “如何将2D Excel表格”展平“或”折叠“为1D？”1

Answer 1

因此，这是使用包melt中的reshape2的一个解决方案：

dm <- 
  data.frame( CP000036 = c( "0", "a", "b" ),
              CP001063 = c( "a", "0", "c" ),
              CP001368 = c( "b", "c", "0" ),
              stringsAsFactors = FALSE,
              row.names = c( "CP000036", "CP001063", "CP001368" ) )

# assuming the distance follows a metric we avoid everything below and on the diagonal
dm[ lower.tri( dm, diag = TRUE ) ]  <- NA
dm$Genome1 <- rownames( dm )

# finally melt and avoid the entries below the diagonal with na.rm = TRUE
library(reshape2) 
dm.molten <- melt( dm, na.rm= TRUE, id.vars="Genome1",
                   value.name="Dist", variable.name="Genome2" )

print( dm.molten )
   Genome1  Genome2 Dist
4 CP000036 CP001063    a
7 CP000036 CP001368    b
8 CP001063 CP001368    c

可能有更多高性能解决方案，但我喜欢这个，因为它简单明了。

如何将2D数据帧“展平”或“折叠”为R中的1D数据帧？

1 个答案: