data.table计算列过滤器

时间:2014-02-02 00:21:59

标签: r data.table

我正在尝试使用data.table计算列;

此处的目标是为运行时计算speedup列,相对于1个线程。

    setup       mode name threads runtime
 1:     A      short    K       1      10
 2:     A      short    K       1      11
 3:     A      short    K       1      10
 4:     A      short    K       2       4
 5:     A      short    K       2       5
 6:     A      short    K       2       8
 7:     B      short    K       1      11
 8:     B      short    K       1      12
 9:     B      short    K       1      10
10:     B      short    K       2       9
11:     B      short    K       2       6
12:     B      short    K       2       8

这就是我得到的......

valT[, speedup:=mean(runtime)/runtime, by=c("setup","threads","name","mode") ]

当然,出现的加速不是我想要的;例如,第一行加速计算应为1.1;第四名应该是2.75。这就是我需要缩小选择范围的原因。 which似乎是答案,但我无法正确部署它:

valT[, speedup:=mean(runtime)/runtime, which(threads==1), by=c("setup","threads","name","mode") ]
    Error in `[.data.table`(valT, , runtime/mean(runtime), which(threads ==  : 
      Provide either 'by' or 'keyby' but not both

数据:

valT = data.table(structure(list(setup = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
    mode = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L), .Label = "     short", class = "factor"), name = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "  K", class = "factor"), 
    threads = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L
    ), runtime = c(10, 11, 10, 4, 5, 8, 11, 12, 10, 9, 6, 8)), .Names = c("setup", 
"mode", "name", "threads", "runtime"), class = "data.frame", row.names = c(NA, 
-12L)))

1 个答案:

答案 0 :(得分:3)

这有效:

valT[, speedup := mean(runtime[threads == 1]) / runtime,
     by = c("setup","name","mode")]