data.table在某些条件下“失去”因子水平

时间:2014-01-16 17:27:04

标签: r data.table

使用data.table包,我正在使用以reproduce(df生成的以下数据框

    outRes vars ts_length   BIAS
1       1t   sd         0 -0.046
2       1t   sd         3 -0.105 
3       1t   sd         6 -0.249
4       1t   sd         1 -0.024
5       1t   sd         1  1.246
6       1t   sd         6  0.885
7       1t   sd         1  0.280
46    day    sd         0 -0.061    
47    day    sd         3 -0.119
48    day    sd         6 -0.256
49    day    sd         1 -0.039
50    day    sd         1  1.239
51    day    sd         6  0.888
52    day    sd         1  0.253
268  month   LE         1 -0.085
269  month   LE         3 -0.147
270  month   LE         6 -0.305


df <- structure(list(outRes = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,3L, 3L, 3L), 
          .Label = c("1t", "day", "month"), class = "factor"),
           vars = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("H","LE", "sd", "sm2", "Ts2"), class = "factor"),
           ts_length = structure(c(1L, 3L, 4L, 2L, 2L, 4L, 2L, 2L, 3L,4L), .Label = c("0", "1", "3", "6"), class = "factor"), 
           BIAS = c(-0.046,-0.105, -0.249, -0.024, 1.246, 0.885, 0.28, -0.085, -0.147,-0.305)), 
          .Names = c("outRes", "vars", "ts_length", "BIAS"), class = "data.frame",
           row.names = c(1L, 2L, 3L, 4L, 5L, 6L,7L, 268L, 269L, 270L)) 

首先,我需要找到每组df$BIASdf$vars df$outRes中的最低值。使用上面的示例outRes=1tvars = sd,最小的BIAS是-0.024,因此我需要打印ts_length =“1”;对于outRes = day,我需要ts_length = 0表示最小BIAS = -0.061。使用data.table包,我可以使用

输出BIAS的值
 dt = as.data.table(df)
 dt[,min(abs(BIAS)),by="vars,outRes"]

给我输出

vars outRes    V1
1:   sd     1t 0.024
2:  sm2     1t 2.615
3:  Ts2     1t 0.000
4:    H     1t 0.735
5:   LE     1t 0.018
6:   sd    day 0.039
7:  sm2    day 2.661 etc...

我想要做的是获取与df$ts_length列对应的V1。我试过了

setkey(dt,outRes,vars,BIAS) 
dt[J(dt[,min(abs(BIAS)),by="outRes,vars"])]
       [V1== BIAS,list(ID,ts_length,BIAS,outRes,vars)]

$vars的5个等级中有2个消失了,给出了这些结果:

   ts_length  BIAS outRes vars
1:         3 0.018     1t   LE
2:         0 2.615     1t  sm2
3:         6 0.000     1t  Ts2
4:         0 0.005    day   LE
5:         0 2.661    day  sm2

我是data.table的新手并且承认我并不太了解代码本身,所以我也尝试了

setkey(dt,vars,outRes,BIAS) 
dt[J(dt[,min(abs(BIAS)),by="vars,outRes"])]
       [V1== BIAS,list(ts_length,BIAS,vars,outRes)]

但我也只获得3个等级。怎么了?我怎样才能得到因子vars的5个等级而不仅仅是3个?

1 个答案:

答案 0 :(得分:1)

感谢可重复的例子。 请尝试以下方法:

setkey(dt, vars, outRes)

dt[ CJ(levels(vars), levels(outRes))
  , .SD[abs(BIAS) == min(abs(BIAS))]
  , .SDcols=c("BIAS", "ts_length")
]

    vars outRes   BIAS ts_length
 1:    H     1t     NA        NA
 2:    H    day     NA        NA
 3:    H  month     NA        NA
 4:   LE     1t     NA        NA
 5:   LE    day     NA        NA
 6:   LE  month -0.085         1
 7:   sd     1t -0.024         1
 8:   sd    day     NA        NA
 9:   sd  month     NA        NA
10:  sm2     1t     NA        NA
11:  sm2    day     NA        NA
12:  sm2  month     NA        NA
13:  Ts2     1t     NA        NA
14:  Ts2    day     NA        NA
15:  Ts2  month     NA        NA