对于R中有多列的数据集,如何逐组减去多行中的一行?

时间:2017-05-10 17:41:38

标签: r

我想学习如何按组从多行中减去一行,并将结果保存为R中的数据表/矩阵。例如,采用以下数据框:

data.frame("patient" = c("a","a","a", "b","b","b","c","c","c"), "Time" = c(1,2,3), "Measure 1" = sample(1:100,size = 9,replace = TRUE), "Measure 2" = sample(1:100,size = 9,replace = TRUE), "Measure 3" = sample(1:100,size = 9,replace = TRUE))

       patient  Time  Measure.1  Measure.2  Measure.3
    1       a    1        19         5        75
    2       a    2        64        20        74
    3       a    3        40         4        78
    4       b    1        80        91        80
    5       b    2        48        31        73
    6       b    3        10         5         4
    7       c    1        30        67        55
    8       c    2        24        13        90
    9       c    3        45        31        88

对于每位患者,我想从与该患者相关的所有行中减去时间== 1的行。结果将是:

       patient  Time  Measure.1  Measure.2  Measure.3
    1       a    1        0         0         0
    2       a    2        45        15       -1
    3       a    3        21       -1         3
    4       b    1        0         0         0
    5       b    2       -32       -60       -5
    6       b    3       -70       -86       -76
    7       c    1        0         0         0
    ....

我使用dplyr包尝试了以下代码,但无济于事:

raw_patient<- group_by(rawdata,patient, Time)  
baseline_patient <-mutate(raw_patient,cpls = raw_patient[,]- raw_patient["Time" == 0,])

1 个答案:

答案 0 :(得分:4)

由于有多列,我们可以通过在mutate_at中指定变量来使用vars,然后从每列中与“时间”对应的元素中减去元素。 1经过“患者”分组后

library(dplyr)
df1 %>% 
   group_by(patient) %>%
   mutate_at(vars(matches("Measure")), funs(.- .[Time==1]))
# A tibble: 9 × 5
# Groups: patient [3]
#  patient  Time Measure.1 Measure.2 Measure.3
#    <chr> <int>     <int>     <int>     <int>
#1       a     1         0         0         0
#2       a     2        45        15        -1
#3       a     3        21        -1         3
#4       b     1         0         0         0
#5       b     2       -32       -60        -7
#6       b     3       -70       -86       -76
#7       c     1         0         0         0
#8       c     2        -6       -54        35
#9       c     3        15       -36        33

数据

df1 <- structure(list(patient = c("a", "a", "a", "b", "b", "b", "c", 
"c", "c"), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Measure.1 = c(19L, 
64L, 40L, 80L, 48L, 10L, 30L, 24L, 45L), Measure.2 = c(5L, 20L, 
4L, 91L, 31L, 5L, 67L, 13L, 31L), Measure.3 = c(75L, 74L, 78L, 
80L, 73L, 4L, 55L, 90L, 88L)), .Names = c("patient", "Time", 
"Measure.1", "Measure.2", "Measure.3"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9"))