我正在尝试使用lubridate
,data.table
和dplyr
创建一个我必须每季度运行的R脚本。我试图尽可能地自动化它,以便我可能只需要更改目录来运行它。基本上,我的问题是我需要从另一个数据集(数据集 A )创建数据集。该数据集看起来像
ID fromdate todate Quarters Cost Location
1: 29 2015-03-08 2015-03-25 2015Q1 13747.12 Orlando
2: 29 2015-04-08 2015-04-08 2015Q2 1555.08 Miami
3: 29 2015-07-08 2015-07-08 2015Q3 961.51 Miami
4: 29 2015-09-23 2015-09-24 2015Q3 3492.00 Orlando
5: 29 2015-09-24 2015-10-03 2015Q4 9948.56 Orlando
---
593: 174 2017-03-01 2017-03-31 2017Q1 2794.26 Orlando
594: 174 2017-04-05 2017-04-05 2017Q2 425.86 Miami
595: 174 2017-04-03 2017-04-28 2017Q2 2400.24 Orlando
596: 174 2017-05-01 2017-05-31 2017Q2 2805.46 Orlando
597: 174 2017-06-02 2017-06-30 2017Q2 2603.51 Orlando
其中一个ID
的扩展是
ID fromdate todate Quarters CLM_PMT_AMT Location
1: 29 2015-03-08 2015-03-25 2015Q1 13747.12 Orlando
2: 29 2015-04-08 2015-04-08 2015Q2 1555.08 Miami
3: 29 2015-07-08 2015-07-08 2015Q3 961.51 Miami
4: 29 2015-09-23 2015-09-24 2015Q3 3492.00 Orlando
5: 29 2015-09-24 2015-10-03 2015Q4 9948.56 Orlando
6: 29 2015-10-03 2015-10-03 2015Q4 39.33 Orlando
7: 29 2015-10-05 2015-10-05 2015Q4 192.26 Miami
8: 29 2015-10-11 2015-10-14 2015Q4 9478.80 Orlando
9: 29 2015-10-15 2015-10-27 2015Q4 20655.46 Orlando
10: 29 2015-10-06 2015-10-31 2015Q4 1061.70 Orlando
11: 29 2015-11-03 2015-11-03 2015Q4 319.29 Orlando
12: 29 2015-11-05 2015-11-05 2015Q4 894.58 Miami
13: 29 2015-11-05 2015-11-28 2015Q4 21678.48 Orlando
14: 29 2015-12-06 2015-12-06 2015Q4 248.98 Miami
15: 29 2015-12-16 2015-12-25 2015Q4 9948.56 Orlando
16: 29 2015-12-01 2015-12-29 2015Q4 1417.91 Orlando
17: 29 2015-12-30 2016-01-01 2016Q1 9514.55 Orlando
18: 29 2016-01-05 2016-01-10 2016Q1 9682.28 Orlando
19: 29 2016-01-25 2016-01-27 2016Q1 6764.50 Orlando
20: 29 2016-01-03 2016-01-30 2016Q1 1564.87 Orlando
21: 29 2016-02-15 2016-02-17 2016Q1 3908.10 Orlando
22: 29 2016-02-02 2016-02-27 2016Q1 1886.87 Orlando
23: 29 2016-03-03 2016-03-03 2016Q1 76.58 Miami
24: 29 2016-03-03 2016-03-06 2016Q1 3213.78 Orlando
25: 29 2016-03-14 2016-03-23 2016Q1 4871.14 Orlando
我想对这个数据集做的是按季度采用Cost
季度的总和和均值。例如,ID = 29
& Quarters = 2015Q4
将是Cost
从Quarters = 2015Q1
到Quarters = 2015Q4
和Quarters = 2016Q2
的总和和均值,总和和均值应该是Quarters = 2015Q3
到Quarters = 2016Q2
ID
。这应该适用于每个Location
,每个Quarter
和每个A %>%
group_by(ID, Quarters, Location) %>%
...
。我知道我可能需要使用像
Quarters
但我遇到的问题是并非所有ID
都代表每个this.myForm.updateValueAndValidity
。有关如何做到这一点的任何建议?我在我的智慧结束!
答案 0 :(得分:2)
您可以使用tidyr::complete
添加缺失的季度。例如
library(tidyverse)
dt %>%
mutate(Quarters = as.factor(Quarters)) %>%
group_by(ID, Location, Quarters) %>%
summarise_if(is.numeric, funs(mean(., na.rm = TRUE))) %>%
complete(ID, Location, Quarters, fill=list(CLM_PMT_AMT=0)) %>%
mutate_if(is.numeric, funs(roll = zoo::rollmeanr(., k=4, na.pad = TRUE)))
# # A tibble: 10 x 5
# # Groups: ID, Location [2]
# ID Location Quarters CLM_PMT_AMT roll
# <int> <chr> <fctr> <dbl> <dbl>
# 1 29 Miami 2015Q1 0 NA
# 2 29 Miami 2015Q2 1555 NA
# 3 29 Miami 2015Q3 962 NA
# 4 29 Miami 2015Q4 445 740
# 5 29 Miami 2016Q1 76.6 760
# 6 29 Orlando 2015Q1 13747 NA
# 7 29 Orlando 2015Q2 0 NA
# 8 29 Orlando 2015Q3 3492 NA
# 9 29 Orlando 2015Q4 8283 6381
# 10 29 Orlando 2016Q1 5176 4238
答案 1 :(得分:1)
这个怎么样?
library(data.table)
library(mltools)
dt <- data.table(
id = c(1, 1, 1, 1, 1,
2, 2, 2, 2),
somedate = as.Date(c("2014-2-1", "2014-2-28", "2014-9-30", "2014-12-11", "2015-5-15",
"2014-8-11", "2015-6-30", "2015-6-30", "2015-12-1")),
value = c(1, 2, 3, 4, 5,
10, 20, 30, 40)
)
dt
id somedate value YearQuarter
1: 1 2014-02-01 1 2014 Q1
2: 1 2014-02-28 2 2014 Q1
3: 1 2014-09-30 3 2014 Q3
4: 1 2014-12-11 4 2014 Q4
5: 1 2015-05-15 5 2015 Q2
6: 2 2014-08-11 10 2014 Q3
7: 2 2015-06-30 20 2015 Q2
8: 2 2015-06-30 30 2015 Q2
9: 2 2015-12-01 40 2015 Q4
# Insert YearQuarter
dt[, YearQuarter := mltools::date_factor(somedate, type = "yearquarter")]
# Build table of all possible (id, YearQuarter) pairs based on the levels of dt$YearQuarter
temp <- CJ(id = unique(dt$id), YearQuarter = levels(dt$YearQuarter))
# Aggregate dt to unique (id, YearQuarter) pairs
dt_aggregated <- dt[, list(value_sum = sum(value)), keyby=list(id, YearQuarter)]
# Determine the value_sum in each quarter for each id, via join to temp
result <- dt_aggregated[temp, on=c("id", "YearQuarter")]
result[is.na(value_sum), value_sum := 0]
# Rolling sums by id
result[, RollingAnnualSum := Reduce(`+`, shift(x = value_sum, n = 0:3, fill = 0, type = "lag")), by="id"]
result
id YearQuarter value_sum RollingAnnualSum
1: 1 2014 Q1 3 3
2: 1 2014 Q2 0 3
3: 1 2014 Q3 3 6
4: 1 2014 Q4 4 10
5: 1 2015 Q1 0 7
6: 1 2015 Q2 5 12
7: 1 2015 Q3 0 9
8: 1 2015 Q4 0 5
9: 2 2014 Q1 0 0
10: 2 2014 Q2 0 0
11: 2 2014 Q3 10 10
12: 2 2014 Q4 0 10
13: 2 2015 Q1 0 10
14: 2 2015 Q2 50 60
15: 2 2015 Q3 0 50
16: 2 2015 Q4 40 90