R:如何处理亚小时数据的时间序列?

时间:2016-07-02 11:51:27

标签: r time statistics time-series series

我刚开始使用R并完成了一些教程。但是,我正在尝试进入时间序列分析,并遇到了很大麻烦。我创建了一个看起来像这样的数据框:

    Date        Time        T1
 1  2014-05-22  15:15:00    21.6
 2  2014-05-22  15:20:00    21.2
 3  2014-05-22  15:25:00    21.3
 4  2014-05-22  15:30:00    21.5
 5  2014-05-22  15:35:00    21.1
 6  2014-05-22  15:40:00    21.5

由于我不想半天工作,我从数据框中删除了第一天和最后一天。由于R没有认识到日期和时间本身,但作为"因素",我使用了lubridate库来正确地改变它。现在它看起来像是:

    Date        Time    T1
1   2014-05-23      0S  14.2
2   2014-05-23  5M 0S   14.1
3   2014-05-23  10M 0S  14.6
4   2014-05-23  15M 0S  14.3
5   2014-05-23  20M 0S  14.4
6   2014-05-23  25M 0S  14.5

现在麻烦真的开始了。使用ts功能将日期更改为16944,将时间更改为0.如何使用正确的开始日期和频率设置数据框?一组新数据在5分钟内出现,因此频率应为288.我还尝试将开始日期设置为向量。自从5月22日是我尝试这一年的第142天

ts_df <- ts(df, start=c(2014, 142/365), frequency=288) 

没有错误,但当我去start(ds_df)时,我得到了end(ds_df)

[1] 2013.998
[1] 2058.994

任何人都可以给我一个如何使用这些数据的提示吗?

1 个答案:

答案 0 :(得分:1)

"ts"类通常不适合该类型的数据。假设DF是在本答案末尾的注释中可重复显示的数据框,我们将其转换为"zoo"类对象,然后执行一些操作。也可以使用相关的xts包。

library(zoo)

z <- read.zoo(DF, index = 1:2, tz = "")

window(z, start = "2014-05-22 15:25:00")

head(z, 3) # first 3
head(z, -3) # all but last 3
tail(z, 3) # last 3
tail(z, -3) # all but first 3

z[2:4] # 2nd, 3rd and 4th element of z

coredata(z) # numeric vector of data values
time(z) # vector of datetimes

fortify.zoo(z) # data frame whose 2 cols are (1) datetimes and (2) data values

aggregate(z, as.Date, mean) # convert to daily averaging values

ym <- aggregate(z, as.yearmon, mean) # convert to monthly averaging values
frequency(ym) <- 12 # only needed since ym only has length 1
as.ts(ym) # year/month series can be reasonably converted to ts

plot(z)

library(ggplot2)
autoplot(z)

read.zoo也可用于从文件中读取数据。

注意: DF以上可重现的形式使用:

DF <- structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "2014-05-22", 
class = "factor"), 
    Time = structure(1:6, .Label = c("15:15:00", "15:20:00", 
    "15:25:00", "15:30:00", "15:35:00", "15:40:00"), class = "factor"), 
    T1 = c(21.6, 21.2, 21.3, 21.5, 21.1, 21.5)), .Names = c("Date", 
"Time", "T1"), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6"))