Question

我想将数据框中的多个列组合/配对，作为同一行中的列单元格对。例如，df1应转换为df2。

df1

col1 col2 col3
1    2    3   
0    0    1

df2

解决方案应该可以扩展df1 s，其中（方式）超过三列。

我想过融化/重塑/ dcast，但还没有找到解决方案。数据框中没有NA。谢谢！

编辑：Reshape刚出错，所以我想到了

combn(df1[1,], 2) comb2 <- t(comb1)

并循环并追加所有行。考虑到200万行，效率低下。

Answer 1

这是我采取的方法。

创建一个使用来自＆＃34; data.table＆＃34;的rbindlist的函数。来自基础R的combn。函数如下所示：

lengthener <- function(indf) {
  temp <- rbindlist(
    combn(names(indf), 2, FUN = function(x) indf[x], simplify = FALSE),
    use.names = FALSE, idcol = TRUE)
  setorder(temp[, .id := sequence(.N), by = .id], .id)[, .id := NULL][]
}

这里是来自其他答案的示例数据，以及该功能的应用：

df1 = as.data.frame(matrix(c(1,2,3,4,0,0,1,1), byrow = TRUE, nrow = 2))

lengthener(df1)
#     V1 V2
#  1:  1  2
#  2:  1  3
#  3:  1  4
#  4:  2  3
#  5:  2  4
#  6:  3  4
#  7:  0  0
#  8:  0  1
#  9:  0  1
# 10:  0  1
# 11:  0  1
# 12:  1  1

测试一些更大的数据：

set.seed(1)
M <- as.data.frame(matrix(sample(100, 100*100, TRUE), 100))
system.time(out <- lengthener(M))
#    user  system elapsed 
#    0.19    0.00    0.19 
out
#         V1 V2
#      1: 27 66
#      2: 27 27
#      3: 27 68
#      4: 27 66
#      5: 27 56
#     ---      
# 494996: 33 13
# 494997: 33 66
# 494998: 80 13
# 494999: 80 66
# 495000: 13 66

另一种方法的系统时间：

funAMK <- function(indf) {
  nrow_combn = nrow(t(combn(indf[1,], m = 2)))
  nrow_df = nrow(indf) * nrow_combn
  df2 = data.frame(V1 = rep(0, nrow_df), V2 = rep(0, nrow_df))
  for(i in 1:nrow(indf)){
    df2[(((i-1)*nrow_combn)+1):(i*(nrow_combn)), ] = data.frame(t(combn(indf[i,], m = 2)))
  }
  df2
}

> system.time(funAMK(M))
   user  system elapsed 
  16.03    0.16   16.37

Answer 2

您的编辑与我下面的答案非常相似，您只需要在df1的行上重复每次迭代的结果。使用data.table是加速rbind see this answer for more的好方法。

编辑：不幸的是，当我切换到data.table方法时，事实证明rbindlist（）导致答案是错误的（正如下面的评论中所指出的）。因此，虽然它可能稍慢，但我认为预分配数据框和使用rbind可能是最好的选择。

EDIT2：将预分配的df切换为更一般的行数。

df1 = as.data.frame(matrix(c(1,2,3,4,0,0,1,1), byrow = TRUE, nrow = 2))
nrow_combn = nrow(t(combn(df1[1,], m = 2)))
nrow_df = nrow(df1) * nrow_combn
df2 = data.frame(V1 = rep(0, nrow_df), V2 = rep(0, nrow_df))
for(i in 1:nrow(df1)){
  df2[(((i-1)*nrow_combn)+1):(i*(nrow_combn)), ] = data.frame(t(combn(df1[i,], m = 2)))
}

R：将多个列组合为同一行中的列单元格对

2 个答案: