R:从数据帧计算加速因子

时间:2016-03-06 09:19:32

标签: r dataframe grouping

我存储了以下数据框:

Source: local data frame [18 x 3]
Groups: instance [?]

   instance          V2             wtime
     (fctr)      (fctr)             (dbl)
1    CCRG10  BranchDBMS         2.1845122
2    CCRG10  CacheDBMS          0.8619093
3    CCRG20  BranchDBMS         7.3522605
4    CCRG20  CacheDBMS          2.5523066
5    CCRG30  BranchDBMS        15.7318869
6    CCRG30  CacheDBMS          5.1411876
7    CCRG40  BranchDBMS        31.7315724
8    CCRG40  CacheDBMS          7.6714212
9    CCRG50  BranchDBMS        58.0909133
10   CCRG50  CacheDBMS         11.3979914
11   CCRG60  BranchDBMS        78.5095645
12   CCRG60  CacheDBMS         15.5988044
13   CCRG70  BranchDBMS        94.0637485
14   CCRG70  CacheDBMS         20.2977642
15   CCRG80  BranchDBMS       102.8716548
16   CCRG80  CacheDBMS         25.0142898
17   CCRG90  BranchDBMS       100.5247555
18   CCRG90  CacheDBMS         28.3753977

我想将此表转换为新表,例如

Source: local data frame [9 x 2]
Groups: instance [?]

   instance           speedup
     (fctr)             (dbl)
1    CCRG10         2.5345035
...

对于每个实例,我想将BranchDBMS的wtime除以CacheDBMS,此处为2.18 / 0.86 = 2.53。

如何自动完成此过程?

1 个答案:

答案 0 :(得分:2)

通过查看发布的输出,您似乎可以在dplyr内管理您的表格,因此tidyr方法将是一种自然选择。

代码

Vectorize(require)(package = c("dplyr", "magrittr", "tidyr"),
                   character.only = TRUE)
dta %<>%
    spread(key = V3, value = V4) %>% 
    mutate(wtimRes = BranchDBMS / CacheDBMS) %>% 
    rename(instance = V2)

结果

> head(dta, 5)
  instance BranchDBMS  CacheDBMS  wtimRes
1   CCRG10   2.184512  0.8619093 2.534504
2   CCRG20   7.352260  2.5523066 2.880634
3   CCRG30  15.731887  5.1411876 3.059971
4   CCRG40  31.731572  7.6714212 4.136336
5   CCRG50  58.090913 11.3979914 5.096592

收集

当然,如果需要,您可能希望 gather 将您的搜索结果放入一列。

dta %<>%
    gather(key = key, value = value, -instance)

会产生:

> head(dta,6)
  instance        key     value
1   CCRG10 BranchDBMS  2.184512
2   CCRG20 BranchDBMS  7.352260
3   CCRG30 BranchDBMS 15.731887
4   CCRG40 BranchDBMS 31.731572
5   CCRG50 BranchDBMS 58.090913
6   CCRG60 BranchDBMS 78.509564

数据导入

dtaTxt <- "   instance          V2             wtime
     (fctr)      (fctr)             (dbl)
1    CCRG10  BranchDBMS         2.1845122
2    CCRG10  CacheDBMS          0.8619093
3    CCRG20  BranchDBMS         7.3522605
4    CCRG20  CacheDBMS          2.5523066
5    CCRG30  BranchDBMS        15.7318869
6    CCRG30  CacheDBMS          5.1411876
7    CCRG40  BranchDBMS        31.7315724
8    CCRG40  CacheDBMS          7.6714212
9    CCRG50  BranchDBMS        58.0909133
10   CCRG50  CacheDBMS         11.3979914
11   CCRG60  BranchDBMS        78.5095645
12   CCRG60  CacheDBMS         15.5988044
13   CCRG70  BranchDBMS        94.0637485
14   CCRG70  CacheDBMS         20.2977642
15   CCRG80  BranchDBMS       102.8716548
16   CCRG80  CacheDBMS         25.0142898
17   CCRG90  BranchDBMS       100.5247555
18   CCRG90  CacheDBMS         28.3753977"

dta <- read.table(textConnection(dtaTxt), header = FALSE, 
                  colClasses=c("NULL", NA, NA, NA), skip = 2)