计算分组平均值并在R的新列中填充

时间:2019-05-16 23:55:21

标签: r calculated-columns

我需要一个新的均值列,该列均按月分组,并且每个月的相同均值重复出现该月n次。

已计算出分组平均值,但它们是唯一的值,我无法连续数月重复n次。

tn_1$GAVG = aggregate(tn_1$FATALITIES, list(tn_1$MONTH), mean)

出现以下错误

 Error in `$<-.data.frame`(`*tmp*`, GAVG, value = list(Group.1 = c("01",  : 
 replacement has 12 rows, data has 6267

新列必须显示每个月的平均值。

 structure(list(FATALITIES = c(1L, 2L, 5L, 5L, 3L, 3L, 3L, 4L, 
8L, 1L, 7L, 4L, 3L, 4L, 12L, 4L, 1L, 2L, 3L, 1L, 0L, 0L, 4L, 
1L, 0L, 5L, 0L, 12L, 3L, 2L, 4L, 5L, 1L, 22L, 0L, 1L, 2L, 4L, 
7L, 3L), MONTH = c("04", "04", "04", "04", "05", "05", "05", 
"05", "05", "05", "05", "05", "05", "05", "06", "06", "06", "06", 
"06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", 
"06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06"
)), .Names = c("FATALITIES", "MONTH"), row.names = c(NA, 40L), class = "data.frame")

2 个答案:

答案 0 :(得分:2)

使用dplyr

dat %>% group_by(MONTH) %>% mutate(avg=mean(FATALITIES))
# A tibble: 40 x 3
# Groups:   MONTH [3]
   FATALITIES MONTH   avg
        <int> <chr> <dbl>
 1          1 04     3.25
 2          2 04     3.25
 3          5 04     3.25
 4          5 04     3.25
 5          3 05     4.00
 6          3 05     4.00
 7          3 05     4.00
 8          4 05     4.00
 9          8 05     4.00
10          1 05     4.00
# ... with 30 more rows

或者,如果您只想使用基数R:

dat$avg<- ave(dat$FATALITIES, dat$MONTH, FUN=mean)

后者大约快10倍:

microbenchmark(one=dat %>% group_by(MONTH) %>% mutate(avg=mean(FATALITIES)), two=ave(dat$FATALITIES, dat$MONTH, FUN=mean))
Unit: microseconds
 expr      min       lq      mean   median        uq      max neval
  one 3698.875 4018.193 4438.3810 4283.864 4650.8455 10019.83   100
  two  265.885  326.586  458.7712  476.840  530.2735   820.31   100

答案 1 :(得分:1)

尝试使用基数R ave

df$MEAN=ave(df$FATALITIES,df$MONTH,FUN=mean)