R时间序列,序列复杂

时间:2011-03-18 06:37:33

标签: r merge zoo seq

我试图在R中合并两个不同的时间序列,具有以下特征:

  1. 数据必须每天在08:30至15:00之间。
  2. 数据跨越数周,而不仅仅是某一天。
  3. 数据中随机存在间隙。
  4. 这两个数据集必须在相同的时间间隔内没有间隙
  5. 我想合并两个数据集,所有时间都在08:30到15:00的序列中,并且每个数据集中都有一个间隙,我希望结转前一个值(或后面的值)。 / p>

    # I have verified that the csv files are imported correctly
    # The first column contains dates. and the strptime
    # function can convert strings into Date/Time objects.
    #
    sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
    sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")
    
    # The second column contains the close.
    # I use the zoo function to create zoo objects from that data.
    # But for some reason this ends up creating duplicates PROBLEM 1
    #
    a <- zoo(sec1[,2], sec1_dates)
    b <- zoo(sec2[,2], sec2_dates)
    
    # I know that I need use seq to fill in gaps but I am clueless as to how
    # Once I have the proper seq I can just use na.locf to fill the appropriate values
    # HOWEVER seq(start(sec1_dates), end(sec1_dates), "min") would end up returning
    # every minute for each day, and I only want 08:30 to 15:30. PROBLEM 2
    
    # The merge function can combine two zoo objects, in union
    # Obviously this fails because the two index sizes don't match PROBLEM 3
    #
    t.zoo <- merge(a, b, all=TRUE)
    
      

    詹姆斯你对问题1是正确的。谢谢。我验证了csv文件是两次拉数据并删除数据修复了问题。我也使用了你的问题2的解决方案,但我不确定这是做我正在尝试做的最有效的方法。最终,我可能希望使用它来运行回归,并且在那时可能需要某种循环来拉动任意数量的数据集。我可能会做出任何优化。

    更新的解决方案

    library(zoo)
    library(tseries)
    
    # Read the CSV files into data frames
    sec1 <- read.csv("C:\\exportdata\\sec1.csv", stringsAsFactors=F, header=F)
    sec2 <- read.csv("C:\\exportdata\\sec2.csv", stringsAsFactors=F, header=F)
    
    # The first column contains dates.  
    # I use strptime to tell it what format these appear in.
    sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
    sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")
    
    # The second column contains the close prices for the securities.
    # I use the zoo function to create zoo objects from that data.
    # Input =  a vector of data and a vector of dates.
    a <- zoo(sec1[,2], sec1_dates)
    b <- zoo(sec2[,2], sec2_dates)
    
    # create a discrete time-series with the exact time frame desired
    # per tip from James
    template <- zoo(NULL, seq(sec1_dates[1], tail(sec1_dates, 1), "min"))
    template <- template[which(strftime(time(template),"%H:%M")>"08:30" & strftime(time(template),"%H:%M")<"15:00")]
    
    # The merge function is then used to merge
    # 1) each security to the template (uses the discrete date/time range)
    # 2) remove the column of data from template (used only for dates)
    # 3) each security to one another (this was the ultimate goal anyway.
    a.zoo <- merge(a, template, all=TRUE)
    a.zoo$template <- NULL
    b.zoo <- merge(b, template, all=TRUE)
    b.zoo$template <- NULL
    t.zoo <- merge(a.zoo, b.zoo, all=TRUE)
    
    # Fill all NA elements with the closest non NA value.
    t <- na.locf(t.zoo)
    

1 个答案:

答案 0 :(得分:1)

问题1

?zoo详细介绍了如何处理重复项,但这可能是因为您在strptime创建的日期中有重复项。

问题2

您可以使用[whichtimezoo个对象进行子集时间,请参阅?zoo,例如:

t.zoo[which(strftime(time(t.zoo),"%H:%M")>"08:30" & strftime(time(t.zoo),"%H:%M")<"15:30")]

问题3

使用c合并:t.zoo <- c(a,b)