我有一个很大的数据集(19945862,包含3个变量),并且正在为大约50,000个成员建立时间序列模型。以下是我正在使用的脚本。
memeberid date amount
165595 17471 2
165596 17471 1.99
165596 17471 1.29
165597 17471 0.88
...... 19945858 obs
library(sweep)
library(tidyquant)
library(forecast)
library(timetk)
yearweek <- function(x) structure(floor(52*x + .00001)/52, class = "yearweek")
按周重新排列数据
ts$date <- as.numeric(ts$date)
weekly_amt_by_member <- ts %>%
mutate(order.week = as_date(yearweek(date))) %>%
group_by(memberid, order.week) %>%
summarise(total.amt = sum(amount))
使用nest()捆绑memeberid上的数据
weekly_amt_by_member_nest <- weekly_amt_by_member %>%
group_by(memberid) %>%
nest(.key = "data.tbl")
将成员ID捆绑包和函数突变并映射到ts类对象
weekly_amt_by_member_ts <- weekly_amt_by_member_nest %>%
mutate(data.ts = map(.x = data.tbl ,
.f = tk_ts ,
select = -order.week ,
start = 2017 ,
freq = 52))
建模和模型
weekly_amt_by_member_fit <- weekly_amt_by_member_ts %>%
mutate(fit.ets = map(data.ts , ets))
预测模型
weekly_amt_by_member_fcast <- weekly_amt_by_member_fit %>%
mutate(fcast.ets = map(fit.ets , forecast , h = 52))
以上所有部分都在正常运行,但最后一部分(预测模型)却给了我以下错误:
mutate_impl(.data,点)中的错误: 评估错误:“ from”必须为有限数字。