我需要一个新的均值列,该列均按月分组,并且每个月的相同均值重复出现该月n次。
已计算出分组平均值,但它们是唯一的值,我无法连续数月重复n次。
tn_1$GAVG = aggregate(tn_1$FATALITIES, list(tn_1$MONTH), mean)
出现以下错误
Error in `$<-.data.frame`(`*tmp*`, GAVG, value = list(Group.1 = c("01", :
replacement has 12 rows, data has 6267
新列必须显示每个月的平均值。
structure(list(FATALITIES = c(1L, 2L, 5L, 5L, 3L, 3L, 3L, 4L,
8L, 1L, 7L, 4L, 3L, 4L, 12L, 4L, 1L, 2L, 3L, 1L, 0L, 0L, 4L,
1L, 0L, 5L, 0L, 12L, 3L, 2L, 4L, 5L, 1L, 22L, 0L, 1L, 2L, 4L,
7L, 3L), MONTH = c("04", "04", "04", "04", "05", "05", "05",
"05", "05", "05", "05", "05", "05", "05", "06", "06", "06", "06",
"06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06",
"06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06"
)), .Names = c("FATALITIES", "MONTH"), row.names = c(NA, 40L), class = "data.frame")
答案 0 :(得分:2)
使用dplyr
:
dat %>% group_by(MONTH) %>% mutate(avg=mean(FATALITIES))
# A tibble: 40 x 3
# Groups: MONTH [3]
FATALITIES MONTH avg
<int> <chr> <dbl>
1 1 04 3.25
2 2 04 3.25
3 5 04 3.25
4 5 04 3.25
5 3 05 4.00
6 3 05 4.00
7 3 05 4.00
8 4 05 4.00
9 8 05 4.00
10 1 05 4.00
# ... with 30 more rows
或者,如果您只想使用基数R:
dat$avg<- ave(dat$FATALITIES, dat$MONTH, FUN=mean)
后者大约快10倍:
microbenchmark(one=dat %>% group_by(MONTH) %>% mutate(avg=mean(FATALITIES)), two=ave(dat$FATALITIES, dat$MONTH, FUN=mean))
Unit: microseconds
expr min lq mean median uq max neval
one 3698.875 4018.193 4438.3810 4283.864 4650.8455 10019.83 100
two 265.885 326.586 458.7712 476.840 530.2735 820.31 100
答案 1 :(得分:1)
尝试使用基数R ave
df$MEAN=ave(df$FATALITIES,df$MONTH,FUN=mean)