我想知道如何为每个参数组合创建未来的时间戳(BranchId,Hour,weekdays)
BranchId Hour weekdays ActivityDate Total
1 11 3 2018-02-06T00:00:00 18
1 11 3 2018-02-13T00:00:00 23
1 12 3 2018-02-06T00:00:00 15
1 12 3 2018-02-13T00:00:00 13
1 13 3 2018-02-06T00:00:00 24
1 13 3 2018-02-13T00:00:00 22
目前我只能为一个组合创建未来的时间戳,如下所示:
BranchId Hour weekdays ActivityDate Total
1 11 3 2018-02-06T00:00:00Z 18
1 11 3 2018-02-13T00:00:00Z 23
1 11 3 2018-02-20T00:00:00Z
1 11 3 2018-02-27T00:00:00Z
1 11 3 2018-03-06T00:00:00Z
1 11 3 2018-03-13T00:00:00Z
代码是:
min.date <- min(data$ActivityDate)
max.date <- max(data$ActivityDate)
unique.time <- seq(from = min.date, to = max.date, by = "week")
forecast.time <- seq(from = max.date, by = observation.freq, length.out = 4 + 1)[-1]
all.time <- c(unique.time, forecast.time)
all.time <- data.frame(BranchId = data$BranchId[1], Hour = data$Hour[1], weekdays = data$weekdays[1],ActivityDate = all.time)
# Join the combination with original data
data <- join(all.time, data, by = c("BranchId","Hour", "weekdays", "ActivityDate"), type = "left")
当我在其上应用此代码时,结果出错了,它不会为每个组合创建未来的时间戳:
BranchId Hour weekdays ActivityDate Total
1 11 3 2018-02-06T00:00:00Z 18
1 11 3 2018-02-13T00:00:00Z 23
1 12 3 2018-02-20T00:00:00Z
1 12 3 2018-02-27T00:00:00Z
1 13 3 2018-03-06T00:00:00Z
1 13 3 2018-03-13T00:00:00Z
我是否需要生成多个函数或for循环来接近它?
答案 0 :(得分:0)
使用R,您可以使用以下代码创建每周间隔的额外日期。使用包pad
中的函数padr
,您可以定义向data.frame添加日期的时间间隔。使用group选项告诉函数应该使用哪些变量来创建新的时间轴或填写缺少的日期。您可以指定开始日期和结束日期,以便在这些日期之间进行所有操作,否则它将是data.frame中可用的min
和max
日期。
min.date <- min(df$ActivityDate)
max.date <- as.Date("2018-03-13T00:00:00Z")
library(padr)
df <- pad(df, interval = "week" , start_val = min.date, end_val = max.date, group = c("BranchId", "Hour", "weekdays"))
# this step can be skipped if you want to keep NA's instead of 0
df <- fill_by_value(df, value = 0)
df
BranchId Hour weekdays ActivityDate Total
1 1 11 3 2018-02-06 18
2 1 11 3 2018-02-13 23
3 1 11 3 2018-02-20 0
4 1 11 3 2018-02-27 0
5 1 11 3 2018-03-06 0
6 1 11 3 2018-03-13 0
7 1 12 3 2018-02-06 15
8 1 12 3 2018-02-13 13
9 1 12 3 2018-02-20 0
10 1 12 3 2018-02-27 0
11 1 12 3 2018-03-06 0
12 1 12 3 2018-03-13 0
13 1 13 3 2018-02-06 24
14 1 13 3 2018-02-13 22
15 1 13 3 2018-02-20 0
16 1 13 3 2018-02-27 0
17 1 13 3 2018-03-06 0
18 1 13 3 2018-03-13 0
数据:
df <- structure(list(BranchId = c(1L, 1L, 1L, 1L, 1L, 1L),
Hour = c(11L, 11L, 12L, 12L, 13L, 13L),
weekdays = c(3L, 3L, 3L, 3L, 3L, 3L),
ActivityDate = as.Date(c("2018-02-06T00:00:00", "2018-02-13T00:00:00","2018-02-06T00:00:00",
"2018-02-13T00:00:00", "2018-02-06T00:00:00", "2018-02-13T00:00:00")),
Total = c(18L, 23L, 15L, 13L, 24L, 22L)),
.Names = c("BranchId", "Hour", "weekdays", "ActivityDate", "Total"),
class = "data.frame", row.names = c(NA, -6L))
答案 1 :(得分:0)
在R中完成
df%>%group_by(BranchId,Hour,weekdays)%>%complete(ActivityDate = seq.Date(min(ActivityDate), min(ActivityDate)+42, by="week"),fill=list(Total=0))
# A tibble: 21 x 5
# Groups: BranchId, Hour, weekdays [3]
BranchId Hour weekdays ActivityDate Total
<int> <int> <int> <date> <dbl>
1 1 11 3 2018-02-06 18
2 1 11 3 2018-02-13 23
3 1 11 3 2018-02-20 0
4 1 11 3 2018-02-27 0
5 1 11 3 2018-03-06 0
6 1 11 3 2018-03-13 0
7 1 11 3 2018-03-20 0
8 1 12 3 2018-02-06 15
9 1 12 3 2018-02-13 13
10 1 12 3 2018-02-20 0