小插曲-如果没有隐式间隙,如何解决

时间:2019-12-31 01:51:02

标签: r time-series tsibble fable

我是tsibble软件包的新手。我有月度数据,不得不强迫使用该寓言包。我遇到的几个问题

  • 即使我应用了索引(根据我的测试),它看起来也不是上课日期 lubridate的ymd函数。
  • has_gaps函数返回FALSE,但是当我对数据进行建模时,出现了以下错误:“。data包含 时间上的隐性差距”
        kclusterer = KMeansClusterer(8, distance = nltk.cluster.util.cosine_distance, repeats = 1)
        predict = kclusterer.cluster(features, assign_clusters = True)
        centroids = kclusterer._centroid
        df_clustering['cluster'] = predict
#         df_clustering['centroid'] = centroids[df_clustering['cluster'] - 1].tolist()
        df_clustering['centroid'] = centroids
library(dplyr)
library(fable)
library(lubridate)
library(tsibble)

test <- data.frame(
   YearMonth = c(20160101, 20160201, 20160301, 20160401, 20160501, 20160601,
                 20160701, 20160801, 20160901, 20161001, 20161101, 20161201),
      Claims = c(13032647, 1668005, 24473616, 13640769, 17891432, 11596556,
                 23176360, 7885872, 11948461, 16194792, 4971310, 18032363),
     Revenue = c(12603367, 18733242, 5862766, 3861877, 15407158, 24534258,
                 15633646, 13720258, 24944078, 13375742, 4537475, 22988443)
)

test_ts <- test %>% 
  mutate(YearMonth = ymd(YearMonth)) %>% 
  as_tsibble(
    index = YearMonth,
    regular = FALSE       #because it picks up gaps when I set it to TRUE
    )

# Are there any gaps?
has_gaps(test_ts, .full = T)

model_new <- test_ts %>% 
  model(
  snaive = SNAIVE(Claims))

任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

看起来as_tsibble不能正确识别YearMonth列中的间隔,因为它是Date类对象。隐藏在帮助页面的“索引”部分中可能有问题:

  

对于规则间隔的tbl_ts,必须选择索引表示。例如,每月数据应对应于Yearmonth或zoo :: yearmon创建的时间索引,而不是Date或POSIXct。

像摘录一样,您可以使用yearmonth()解决问题。但这首先需要进行一些字符串操作,才能将其转换为可以正确解析的格式。

test_ts <- test %>% 
  mutate(YearMonth = gsub("(.{2})01$", "-\\1", YearMonth) %>% 
           yearmonth()
         ) %>%
  as_tsibble(
    index = YearMonth
  )

现在,该模型应无错误运行!不确定为什么has_gaps()测试说明您的示例中一切正常...

答案 1 :(得分:1)

您有一个每日索引,但想要一个月索引。最简单的方法是使用tsibble::yearmonth()函数,但是您需要先将日期转换为字符。

library(dplyr)
library(tsibble)

test <- data.frame(
  YearMonth = c(20160101, 20160201, 20160301, 20160401, 20160501, 20160601,
    20160701, 20160801, 20160901, 20161001, 20161101, 20161201),
  Claims = c(13032647, 1668005, 24473616, 13640769, 17891432, 11596556,
    23176360, 7885872, 11948461, 16194792, 4971310, 18032363),
  Revenue = c(12603367, 18733242, 5862766, 3861877, 15407158, 24534258,
    15633646, 13720258, 24944078, 13375742, 4537475, 22988443)
)

test_ts <- test %>%
  mutate(YearMonth = yearmonth(as.character(YearMonth))) %>%
  as_tsibble(index = YearMonth)