组内成员之间的差异

时间:2016-03-03 19:23:10

标签: r dplyr

我测量了几轮试验的不同处理方法,如下:

set.seed(1)
df <- data.frame(treatment = rep(c('baseline', 'treatment 1', 'treatment 2'), 
                                 times=5),
                 round = rep(1:5, each=3),
                 measurement1 = rep(1:5, each=3) + rnorm(15),
                 measurement2 = rep(1:5, each=3) + rnorm(15))

df

#      treatment round measurement1 measurement2
# 1     baseline     1    0.3735462    0.9550664
# 2  treatment 1     1    1.1836433    0.9838097
# 3  treatment 2     1    0.1643714    1.9438362
# 4     baseline     2    3.5952808    2.8212212
# 5  treatment 1     2    2.3295078    2.5939013
# 6  treatment 2     2    1.1795316    2.9189774
# 7     baseline     3    3.4874291    3.7821363
# 8  treatment 1     3    3.7383247    3.0745650
# 9  treatment 2     3    3.5757814    1.0106483
# 10    baseline     4    3.6946116    4.6198257
# 11 treatment 1     4    5.5117812    3.9438713
# 12 treatment 2     4    4.3898432    3.8442045
# 13    baseline     5    4.3787594    3.5292476
# 14 treatment 1     5    2.7853001    4.5218499
# 15 treatment 2     5    6.1249309    5.4179416

我想要的是data.frame,其中包含每轮治疗与每轮治疗基线之间两次测量的差异。也就是说,按round分组,我希望从两次测量中的每次测量中减去baseline treatment中的相应测量值。

我更喜欢dplyr解决方案,如果存在,但会接受任何与优雅相关的内容。

2 个答案:

答案 0 :(得分:5)

您可以使用mutate_each

mydf %>%
  group_by(round) %>%
  mutate_each(funs(. - .[treatment=="baseline"]), -treatment) %>%
  filter(treatment!="baseline")

给出:

Source: local data frame [10 x 4]
Groups: round [5]

    treatment round measurement1 measurement2
       (fctr) (int)        (dbl)        (dbl)
1  treatment1     1     1.558820   -0.6584485
2  treatment2     1    -0.068677    1.3364462
3  treatment1     2     1.769312   -0.2732490
4  treatment2     2     0.801357   -1.4852449
5  treatment1     3    -1.064394   -1.1513703
6  treatment2     3     2.433222   -0.7939903
7  treatment1     4     0.448744    0.1394982
8  treatment2     4    -1.066922   -1.1410085
9  treatment1     5     1.182761   -0.8311095
10 treatment2     5     0.138005    0.2622119

如果您想将差异添加到数据框中(就像@akrun在 dplyr / tidyr 替代方案中所做的那样),您也可以这样做:

mydf %>%
  group_by(round) %>%
  mutate(diff1 = measurement1 - measurement1[treatment=="baseline"],
         diff2 = measurement2 - measurement2[treatment=="baseline"]) %>%
  filter(treatment!="baseline")

给出:

Source: local data table [10 x 6]

    treatment round measurement1 measurement2     diff1      diff2
       (fctr) (int)        (dbl)        (dbl)     (dbl)      (dbl)
1  treatment1     1     2.630392    -0.104258  1.558820 -0.6584485
2  treatment2     1     1.002895     1.890637 -0.068677  1.3364462
3  treatment1     2     3.822473     3.147443  1.769312 -0.2732490
4  treatment2     2     2.854518     1.935447  0.801357 -1.4852449
5  treatment1     3     1.520553     3.291122 -1.064394 -1.1513703
6  treatment2     3     5.018169     3.648502  2.433222 -0.7939903
7  treatment1     4     4.956380     4.544908  0.448744  0.1394982
8  treatment2     4     3.440714     3.264401 -1.066922 -1.1410085
9  treatment1     5     4.672056     5.082310  1.182761 -0.8311095
10 treatment2     5     3.627300     6.175631  0.138005  0.2622119

答案 1 :(得分:3)

我们可以使用data.table

library(data.table)
setDT(df)[order(round,treatment), tail(.SD,2)- head(.SD,1)[rep(1,2)],
                 round , .SDcols=3:4]

data.table的其他选项是

setDT(df)[, lapply(.SD[, grep("^measurement", names(.SD)),
    with =FALSE], function(x) x[treatment!="baseline"]- 
      x[treatment=="baseline"]) , round]

或使用dplyr/tidyr

 library(dplyr)
 library(tidyr)
 gather(df, var, val, measurement1:measurement2) %>% 
          spread(treatment, val) %>% 
          mutate(diff1 = `treatment 1` - baseline, 
                 diff2 = `treatment 2` - baseline)
相关问题