Dplyr分组名称和日期前最近n个事件的滚动平均值

时间:2018-09-21 06:28:17

标签: r dplyr

我想为一个人(姓名)的最近3个事件创建滚动平均值。 我有想要使用这3个事件中最新事件的日期。有些人可能比其他人少td { white-space: nowrap; } ,这没关系。

创建数据框的代码:

DF

DF:

library(dplyr)

# Create DataFrame

df<- data.frame(name=c('CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE',
                      'JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH',
                      'JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON',
                      'SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON'
                      ),
               GA=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
               SV=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
               GF=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
               SA=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
               date=c("10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016",
                      "10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016",
                      "10/20/2016","10/19/2016","10/18/2016","10/17/2016",
                      "10/20/2016","10/19/2016","10/18/2016","10/17/2016"
                      ),
               stringsAsFactors = FALSE)

创建滚动平均值的代码:

name        GA  SV  GF  SA  date
CAREY.FAKE  2   2   2   2   10/20/2016
CAREY.FAKE  2   2   2   2   10/19/2016
CAREY.FAKE  2   2   2   2   10/18/2016
CAREY.FAKE  2   2   2   2   10/17/2016
CAREY.FAKE  2   2   2   2   10/16/2016
CAREY.FAKE  20  20  20  20  10/15/2016
JOHN.SMITH  2   2   2   2   10/20/2016
JOHN.SMITH  2   2   2   2   10/19/2016
JOHN.SMITH  2   2   2   2   10/18/2016
JOHN.SMITH  2   2   2   2   10/17/2016
JOHN.SMITH  2   2   2   2   10/16/2016
JOHN.SMITH  20  20  20  20  10/15/2016
JEFF.JOHNS  2   2   2   2   10/20/2016
JEFF.JOHNS  2   2   2   2   10/19/2016
JEFF.JOHNS  2   2   2   2   10/18/2016
JEFF.JOHNS  20  20  20  20  10/17/2016
SARA.JOHNS  2   2   2   2   10/20/2016
SARA.JOHNS  2   2   2   2   10/19/2016
SARA.JOHNS  2   2   2   2   10/18/2016
SARA.JOHNS  20  20  20  20  10/17/2016

错误:

df_next <- df %>%
  group_by(name) %>%
  summarise(last_three_mean = mean(tail(GA,SV,GF,SA, 3))

所需结果:

Error in summarise_impl(.data, dots) : 
  Evaluation error: length(n) == 1L is not TRUE.

1 个答案:

答案 0 :(得分:1)

我们可以按日期将arrange summarise_at,然后按{name}分组后使用mean获得多列的library(dplyr) library(lubridate) df %>% group_by(name) %>% arrange(name, mdy(date)) %>% summarise_at(2:5, funs(mean(tail(., 3)))) #or select the column by matching the name pattern #summarise_at(vars(matches("^[A-Z]{2}$")), funs(mean(tail(., 3)))) # A tibble: 4 x 5 # name GA SV GF SA # <chr> <dbl> <dbl> <dbl> <dbl> #1 CAREY.FAKE 2 2 2 2 #2 JEFF.JOHNSON 2 2 2 2 #3 JOHN.SMITH 2 2 2 2 #4 SARA.JOHNSON 2 2 2 2

top_n

或者另一种选择是利用summarise_at然后执行df %>% group_by(name) %>% top_n(mdy(date), n = 3) %>% summarise_at(2:5, mean)

class IndexController < ApplicationController
    def index        
        sleep(10)
        render text: "done"
    end
end