如何对以下数据应用rollapplyr,以使其对日期字段敏感?因为当前我可以使用例如在数据集上应用滚动(对日期盲)。 4个季度,至少4个季度有2个观测值。
#creating the data
set.seed(123)
data.frame(id=c(1,1,1,1,1,2,2,2,2,2),
date=as.Date(as.character(c(20040930, 20041231, 20050331, 20050630, 20050930, 20040930, 20050331, 20050630, 20051231, 20060331)), format = "%Y%m%d"),
col_a=round(runif(10, 0, 100),0),
col_b=round(runif(10, 0, 100),0))
id date col_a col_b
1 1 2004-09-30 3 10
2 1 2004-12-31 8 5
3 1 2005-03-31 4 7
4 1 2005-06-30 9 6
5 1 2005-09-30 9 1
6 2 2004-09-30 0 9
<missing>
7 2 2005-03-31 5 2
8 2 2005-06-30 9 0
<missing>
9 2 2005-12-31 6 3
10 2 2006-03-31 5 10
这是我到目前为止已经尝试过的方法,但这不会考虑缺少的记录,例如。 id = 2的2005-09-30记录。
library(zoo)
data %>%
group_by(id) %>%
mutate(score = (col_a + col_b) / rollapplyr(col_b, 4, mean, fill=NA, by.column=TRUE, partial=2)) %>%
ungroup %>% select(id, date, col_a, col_b, score)
这就是我应用上述功能后得到的
id date col_a col_b score
<dbl> <date> <dbl> <dbl> <dbl>
1 1 2004-09-30 3 10 NA
2 1 2004-12-31 8 5 1.73
3 1 2005-03-31 4 7 1.5
4 1 2005-06-30 9 6 2.14
5 1 2005-09-30 9 1 2.11
6 2 2004-09-30 0 9 NA
7 2 2005-03-31 5 2 1.27
8 2 2005-06-30 9 0 2.45
9 2 2005-12-31 6 3 2.57
10 2 2006-03-31 5 10 4
不过,我期望的是它将自动考虑缺少的宿舍。这是我的预期输出
id date col_a col_b score
<dbl> <date> <dbl> <dbl> <dbl>
1 1 2004-09-30 3 10 NA
2 1 2004-12-31 8 5 1.73
3 1 2005-03-31 4 7 1.5
4 1 2005-06-30 9 6 2.14
5 1 2005-09-30 9 1 2.11
6 2 2004-09-30 0 9 NA
<missing>
7 2 2005-03-31 5 2 1.27
8 2 2005-06-30 9 0 2.45
<missing>
9 2 2005-12-31 6 3 **5.4**
10 2 2006-03-31 5 10 **3.46**
请注意,“
请注意,例如对于第10行,n = 3应该用于平均而不是n = 4,因为它不应该包括丢失的行。
答案 0 :(得分:2)
一种选择是为complete
之前的所有“ id”创建{date}的group_by
行
library(tidyverse)
library(zoo)
complete(data, id, date, fill = list(col_a = 0, col_b = 0)) %>%
group_by(id) %>%
mutate(score = (col_a + col_b) /
rollapplyr(col_b, 4, sum, fill=NA, by.column=TRUE, partial=2)) %>%
ungroup %>%
select(id, date, col_a, col_b, score) %>%
right_join(data)
# A tibble: 10 x 5
# id date col_a col_b score
# <dbl> <date> <dbl> <dbl> <dbl>
# 1 1 2004-09-30 3 10 NA
# 2 1 2004-12-31 8 5 0.867
# 3 1 2005-03-31 4 7 0.5
# 4 1 2005-06-30 9 6 0.536
# 5 1 2005-09-30 9 1 0.526
# 6 2 2004-09-30 0 9 NA
# 7 2 2005-03-31 5 2 0.636
# 8 2 2005-06-30 9 0 0.818
# 9 2 2005-12-31 6 3 1.8
#10 2 2006-03-31 5 10 1.15
data <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
date = structure(c(12691,
12783, 12873, 12964, 13056, 12691, 12873, 12964, 13148, 13238
), class = "Date"), col_a = c(3, 8, 4, 9, 9, 0, 5, 9, 6, 5),
col_b = c(10, 5, 7, 6, 1, 9, 2, 0, 3, 10)), row.names = c(NA,
-10L), class = "data.frame")