我试图在数据集中的多个主题上对多个变量求和。我知道如何使用plyr包来做到这一点;但是,由于数据集的长度,变量的数量,以及我尝试做的不同滚动金额的数量(2天,3天,4天等)。我想知道是否有人在dplyr中完成此任务的时间更有效。
我的数据与此类似:
Subjects <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
Day <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
variable.A <- rnorm(n = Day, mean = 20, sd = 5)
variable.B <- rnorm(n = Day, mean = 50, sd = 15)
variable.C <- rnorm(n = Day, mean = 100, sd = 33)
dat <- data.frame(Subjects, Day, variable.A, variable.B, variable.C)
dat
Subjects Day variable.A variable.B variable.C
1 1 1 20.17676 72.44022 56.69915
2 1 2 14.11462 46.28473 117.00864
3 1 3 15.30440 72.43752 93.17489
4 1 4 13.72422 66.76744 101.26422
5 1 5 21.97695 69.50480 102.61979
6 2 1 14.45742 32.69106 82.37268
7 2 2 33.37783 65.06782 97.17744
8 2 3 13.57833 26.37183 89.38218
9 2 4 23.01717 55.83446 147.85362
10 2 5 14.06008 32.00396 48.73060
11 3 1 14.57199 60.29746 87.07977
12 3 2 15.77413 77.04517 132.17910
13 3 3 30.05661 30.62220 171.35998
14 3 4 24.65348 53.96450 74.99875
15 3 5 26.93699 57.06393 36.81901
我尝试过的代码示例如下:
library(plyr)
library(RcppRoll)
summarize <- ddply(dat, "Subjects", mutate,
Two.Day.Roll.A = roll_sum(variable.A, 2, align = "right", fill = NA),
Two.Day.Roll.B = roll_sum(variable.B, 2, align = "right", fill = NA),
Two.Day.Roll.C = roll_sum(variable.C, 2, align = "right", fill = NA))
Subjects Day variable.A variable.B variable.C Two.Day.Roll.A Two.Day.Roll.B Two.Day.Roll.C
1 1 1 15.324798 24.83074 137.48853 NA NA NA
2 1 2 12.112943 58.86094 86.87454 27.43774 83.69168 224.3631
3 1 3 16.179328 57.95450 68.71333 28.29227 116.81544 155.5879
4 1 4 15.319750 38.13721 79.43194 31.49908 96.09171 148.1453
5 1 5 21.791452 61.99368 134.30205 37.11120 100.13089 213.7340
6 2 1 10.937461 63.83164 95.04865 NA NA NA
7 2 2 14.642376 79.12452 107.13699 25.57984 142.95616 202.1856
8 2 3 17.519905 52.75490 100.62811 32.16228 131.87942 207.7651
9 2 4 23.190371 37.56950 179.72763 40.71028 90.32440 280.3557
10 2 5 13.729350 46.95616 72.14179 36.91972 84.52566 251.8694
11 3 1 9.609171 74.51140 130.90005 NA NA NA
12 3 2 27.542897 14.36222 133.87630 37.15207 88.87363 264.7763
13 3 3 18.750015 60.46183 130.44314 46.29291 74.82405 264.3194
14 3 4 17.461882 52.65797 176.30620 36.21190 113.11979 306.7493
15 3 5 31.244564 62.41614 78.82916 48.70645 115.07411 255.1354
这很好用但是,正如我所说,原始数据有更多的列,我想继续并在所有这些变量上做3天的总和,4天的总和等。另外,我的原始数据中有一些NA,所以也许有办法解决这个问题?
我尝试将mutate_each()函数与dplyr包一起使用,但似乎无法正确使用语法。
谢谢。
答案 0 :(得分:2)
这是dplyr
版本:
library(dplyr)
library(RcppRoll)
dat %>% group_by(Subjects) %>%
mutate_each(funs(roll_sum(., 2, align = "right", fill=NA)), -Subjects, -Day)