使用data.table(第二部分)按组计算马哈拉诺比斯距离

时间:2015-06-30 22:42:45

标签: r data.table

这是我对before提出的问题的继续。我的示例数据和代码是:

library(data.table)
library(StatMatch)
as.data.table(mtcars)[,tryCatch(mahalanobis.dist(mpg[vs == 0], mpg[vs == 
1]),error=function(e) as.numeric(NA)), keyby = carb]

 carb        V1
 1:    2 1.0416378
 2:    2 1.6264169
 3:    2 1.6812399
 4:    2 0.9502661
 5:    2 0.2923896
 6:    2 0.7492482
 7:    2 1.3340273
 8:    2 1.3888504
 9:    2 0.6578765
10:    2 0.5847791

...省略
        碳水化合物V1

上面的代码给出了一列中的所有值。但是,我希望输出采用以下格式(如果可能)。

如何将输出表格改为以下格式:

  +-----------------------------------------------------------------+
     | carb          x1          x2         x3          x4          x5 |
     |-----------------------------------------------------------------|
  1. |    2   1.0416378    1.626417    1.68124   0.9502661   0.2923896 |
  2. |    2   0.7492482    1.334027    1.38885   0.6578765   0.5847791 |
  3. |    2   2.1380986    2.722878   2.777701   2.0467269   0.8040713 |
  4. |    2   2.1380986    2.722878   2.777701   2.0467269   0.8040713 |
  5. |    2   0.4934074    1.078186    1.13301   0.4020356     0.84062 |
     |-----------------------------------------------------------------|
  6. |    3          NA          NA         NA          NA          NA |
  7. |    4   0.4602308   0.8181881         NA          NA          NA |
  8. |    4   0.4602308   0.8181881         NA          NA          NA |
  9. |    4   1.2528505   0.8948932         NA          NA          NA |
 10. |    4   2.2500173     1.89206         NA          NA          NA |
     |-----------------------------------------------------------------|
 11. |    4   2.2500173     1.89206         NA          NA          NA |
 12. |    4    1.150577   0.7926197         NA          NA          NA |
 13. |    4   1.5085343    1.150577         NA          NA          NA |
 14. |    4   0.8693248   0.5113676         NA          NA          NA |
 15. |    6          NA          NA         NA          NA          NA |
     |-----------------------------------------------------------------|
 16. |    8          NA          NA         NA          NA          NA |
     +-----------------------------------------------------------------+

说明:对于碳水化合物2,马哈拉诺比斯距离如下:

           1        2        3         4         5
1 1.0416378 1.626417 1.681240 0.9502661 0.2923896
2 0.7492482 1.334027 1.388850 0.6578765 0.5847791
3 2.1380986 2.722878 2.777701 2.0467269 0.8040713
4 2.1380986 2.722878 2.777701 2.0467269 0.8040713
5 0.4934074 1.078186 1.133010 0.4020356 0.8406200

For carb 4: 
          1         2
1 0.4602308 0.8181881
2 0.4602308 0.8181881
3 1.2528505 0.8948932
4 2.2500173 1.8920600
5 2.2500173 1.8920600
6 1.1505770 0.7926197
7 1.5085343 1.1505770
8 0.8693248 0.5113676

对于碳水化合物3,碳水化合物6和碳水化合物8:马哈拉诺比斯距离无法计算,因此我们对所有列都有NA。

我可以lapply使用rbindlist,如下所示:

  rbindlist(lapply(unique(mtcars$carb),function(i) with(mtcars,
data.frame(tryCatch(mahalanobis.dist(mpg[vs == 0 & carb==i],
mpg[vs== 1 & carb==i]),error=function(e) as.numeric(NA))))),fill=TRUE)
[,-c(6,7,8),with=FALSE]
           X1        X2        X3        X4        X5
 1: 1.0416378 0.7492482 2.1380986 2.1380986 0.4934074
 2: 1.6264169 1.3340273 2.7228777 2.7228777 1.0781865
 3: 1.6812399 1.3888504 2.7777008 2.7777008 1.1330095
 4: 0.9502661 0.6578765 2.0467269 2.0467269 0.4020356
 5: 0.2923896 0.5847791 0.8040713 0.8040713 0.8406200
 6:        NA        NA        NA        NA        NA
 7: 0.4602308 0.8181881        NA        NA        NA
 8: 0.4602308 0.8181881        NA        NA        NA
 9: 1.2528505 0.8948932        NA        NA        NA
10: 2.2500173 1.8920600        NA        NA        NA
11: 2.2500173 1.8920600        NA        NA        NA
12: 1.1505770 0.7926197        NA        NA        NA
13: 1.5085343 1.1505770        NA        NA        NA
14: 0.8693248 0.5113676        NA        NA        NA
15:        NA        NA        NA        NA        NA
16:        NA        NA        NA        NA        NA

我正在寻找不使用lapply的解决方案。

1 个答案:

答案 0 :(得分:3)

您可以使tryCatch块的返回值始终为正确的尺寸,然后再重建矩阵。对于carb = 1,在开头有一行额外的NA。

res <- as.data.table(mtcars)[,tryCatch({
    mat <- mahalanobis.dist(mpg[vs == 0], mpg[vs == 1])
    t(cbind(mat, matrix(NA, nrow=nrow(mat), ncol=5-ncol(mat))))  # add in NA values to fill out columns
   }, error=function(e) rep(as.numeric(NA), 5)), keyby = carb]   # return 5-vector on error

matrix(res[[2]], ncol=5, byrow = T)                              # rebuild matrix
#            [,1]      [,2]      [,3]      [,4]      [,5]
#  [1,]        NA        NA        NA        NA        NA
#  [2,] 1.0416378 0.7492482 2.1380986 2.1380986 0.4934074
#  [3,] 1.6264169 1.3340273 2.7228777 2.7228777 1.0781865
#  [4,] 1.6812399 1.3888504 2.7777008 2.7777008 1.1330095
#  [5,] 0.9502661 0.6578765 2.0467269 2.0467269 0.4020356
#  [6,] 0.2923896 0.5847791 0.8040713 0.8040713 0.8406200
#  [7,]        NA        NA        NA        NA        NA
#  [8,] 0.4602308 0.8181881        NA        NA        NA
#  [9,] 0.4602308 0.8181881        NA        NA        NA
# [10,] 1.2528505 0.8948932        NA        NA        NA
# [11,] 2.2500173 1.8920600        NA        NA        NA
# [12,] 2.2500173 1.8920600        NA        NA        NA
# [13,] 1.1505770 0.7926197        NA        NA        NA
# [14,] 1.5085343 1.1505770        NA        NA        NA
# [15,] 0.8693248 0.5113676        NA        NA        NA
# [16,]        NA        NA        NA        NA        NA
# [17,]        NA        NA        NA        NA        NA