我想对以下数据进行时间序列分析 但我无法将其转换为时间序列数据, 可以从给定链接https://datamarket.com/data/set/22ox/monthly-milk-production-pounds-per-cow-jan-62-dec-75#!ds=22ox&display=line
下载数据我已尝试str_split_fixed
函数将其分成两列,但在拆分后作为时间序列重新组合是一个问题
这就是我的尝试:
#Convert it into Time series
#Train Data
ds.ts<-ts(ds$V2,start = c(1962,1),end = c(1974,12),frequency = 12)
ds.ts
plot(ds.ts)
plot(decompose(ds.ts))
#Test Data
ts.1975<-ts(ds$V2,start=c(1975,1),end=c(1975,12),frequency = 12)
答案 0 :(得分:0)
我导出为.tsv(制表符分隔),但.csv也可以正常工作。然后阅读data.table
并使用substr
将前4位数字提取为年份(并转换为integer
),将最后2位数字提取为月份:
library(data.table)
dt <- fread("~/Downloads/monthly-milk-production-pounds-p.tsv")
dt[, ":="(
year = as.integer(substr(Month, start = 1, stop = 4)),
month = as.integer(substr(Month, start = 6, stop = 7)))]
>dt
Month Monthly milk production: pounds per cow. Jan 62 ? Dec 75 year month
1: 1962-01 589 1962 1
2: 1962-02 561 1962 2
3: 1962-03 640 1962 3
4: 1962-04 656 1962 4
5: 1962-05 727 1962 5
根据AkselA的评论更新:
按照AkselA的建议获取时间序列使用as.Date
:
library(data.table)
dt <- fread("~/Downloads/monthly-milk-production-pounds-p.tsv")
dt[, date_time := as.Date(paste0(Month, "-01"), format="%Y-%m-%d")]
>dt
Month Monthly milk production: pounds per cow. Jan 62 ? Dec 75 date_time
1: 1962-01 589 1962-01-01
2: 1962-02 561 1962-02-01
3: 1962-03 640 1962-03-01
4: 1962-04 656 1962-04-01
5: 1962-05 727 1962-05-01