R合并XTS时间序列导致重复的时间

时间:2017-10-29 22:49:49

标签: r date merge xts posixct

我从来没有找到一种有效的方法来解决我每次尝试组合不同时间序列数据源时遇到的问题。根据不同的来源,我的意思是将来自互联网的数据源(雅虎股票价格)与当地的csv时间序列结合起来。

yahoo.xts  # variable containing security prices from yahoo
local.xts  # local time series data 
cbind(yahoo.xts,local.xts)  # combine them

结果如下:

enter image description here

我得到一个组合的xts数据框,其中包含给定日期的不同时间。我想要的是忽略给定日期的时间并将它们对齐。我解决这个问题的方法是使用as.Date函数提取两个独立的数据源索引和转换,然后将它们重新包装为xts对象。我的问题是,如果我错过了另一种更有效的方法。

注意:我不确定如何提供一个本地数据源的好例子,为您提供一个很好的方法来复制问题,但以下是如何从在线获取数据的片段。

require(quantmod)
data.etf = env()
getSymbols.av(c('XOM','AAPL'), src="av", api.key="your-own-key",from = '1970-01-01',adjusted=TRUE,
            output.size="full",env = data.etf,  set.symbolnames = T, auto.assign = T)
yahoo.xts = Cl(data.etf$XOM)

以下是一些数据:

雅虎:

structure(c(112.68, 109.2, 107.86, 104.35, 104.68, 110.66), class = c("xts", 
"zoo"), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", 
"POSIXt"), .indexTZ = "America/Chicago", tzone = "America/Chicago", index = structure(c(1508457600, 
1508716800, 1508803200, 1508889600, 1508976000, 1509062400), tzone = "America/Chicago", tclass = c("POSIXct", 
"POSIXt")), .Dim = c(6L, 1L), .Dimnames = list(NULL, "XIV"))

本地结构:

structure(c(0.176601541324807, -0.914132074513824, -0.0608652702022332, 
-0.196679777210441, -0.190397155984135, 0.915313388202916, -0.0530280808936784, 
0.263895885521142, 0.10844973759151, 0.0547864992300319, 0.0435149080877898, 
-0.202388932508539, 0.0382888645282672, -0.00800908217028123, 
-0.0798424223984417, 0.00268898461896916, 0.00493307845560457, 
0.132697099147406, 0.074267173330532, -0.336299384720176, -0.0859815663679892, 
-0.0597168456705514, -0.0867777000321366, 0.283394650847026, 
-0.0100414455118704, 0.106355723615723, -0.0640682814821423, 
0.0481841070155836, -0.00321273561708742, -0.13182105331959), .indexCLASS = c("POSIXct", 
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = structure("America/Chicago", .Names = "TZ"), tzone = structure("America/Chicago", .Names = "TZ"), class = c("xts", 
"zoo"), na.action = structure(1L, class = "omit", index = 1080540000), index = structure(c(1508475600, 
1508734800, 1508821200, 1508907600, 1508994000, 1509080400), tzone = structure("America/Chicago", .Names = "TZ"), tclass = c("POSIXct", 
"POSIXt")), .Dim = c(6L, 5L), .Dimnames = list(NULL, c("D.30", 
"D.60", "D.90", "D.120", "D.150")))

1 个答案:

答案 0 :(得分:3)

如果您了解问题的根源,也许您可​​以首先避免这个问题。

您的问题是,当合并发生时,打印结果中的19:00:00标记对应于UTC日期(截至UTC时间上午12点)转换为“America / Chicago”POSIXct时间戳。

正如您所指出的,一种解决方案是制作全新日期格式的新xts时间索引。但它确实很烦人。如果可以的话,最好首先避免这种情况,否则你必须使用适当的时区将日期时间序列更改为POSIXct时间序列。

当您将xts对象与日期数据(或者更准确地说,认为是日期数据)不对齐时,您需要了解的关键是时区未在对象中对齐。如果时区在xts对象的时间索引中对齐,那么您将获得正确的合并而没有不良行为。当然,日期对象没有时区,默认情况下,如果它们与时间索引类型为POSIXct的xts对象合并,它们将被赋予时区“UTC”。

# reproduce your data (your code isn't reproducible fully for me:

require(quantmod)
data.etf = new.env()
getSymbols(c('XOM','AAPL'), src="yahoo", api.key="your-own-key",from = '1970-01-01',adjusted=TRUE,output.size="full",env = data.etf,  set.symbolnames = T, auto.assign = T)
yahoo.xts = Cl(data.etf$XOM)

z <- structure(c(0.176601541324807, -0.914132074513824, -0.0608652702022332, 
                 -0.196679777210441, -0.190397155984135, 0.915313388202916, -0.0530280808936784, 
                 0.263895885521142, 0.10844973759151, 0.0547864992300319, 0.0435149080877898, 
                 -0.202388932508539, 0.0382888645282672, -0.00800908217028123, 
                 -0.0798424223984417, 0.00268898461896916, 0.00493307845560457, 
                 0.132697099147406, 0.074267173330532, -0.336299384720176, -0.0859815663679892, 
                 -0.0597168456705514, -0.0867777000321366, 0.283394650847026, 
                 -0.0100414455118704, 0.106355723615723, -0.0640682814821423, 
                 0.0481841070155836, -0.00321273561708742, -0.13182105331959), .indexCLASS = c("POSIXct", 
                                                                                               "POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = structure("America/Chicago", .Names = "TZ"), tzone = structure("America/Chicago", .Names = "TZ"), class = c("xts", 
                                                                                                                                                                                                                                                                  "zoo"), na.action = structure(1L, class = "omit", index = 1080540000), index = structure(c(1508475600, 
                                                                                                                                                                                                                                                                                                                                                             1508734800, 1508821200, 1508907600, 1508994000, 1509080400), tzone = structure("America/Chicago", .Names = "TZ"), tclass = c("POSIXct", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          "POSIXt")), .Dim = c(6L, 5L), .Dimnames = list(NULL, c("D.30", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 "D.60", "D.90", "D.120", "D.150")))

#inspect the index timezones and classes:
> class(index(z))
# [1] "POSIXct" "POSIXt" 
> class(index(yahoo.xts))
# [1] "Date"

indexTZ(z)
# TZ 
# "America/Chicago" 
indexTZ(yahoo.xts)
# [1] "UTC"

您可以看到yahoo.xts正在使用日期类。当它与POSIXct类合并时(即与z合并,它将被转换为“UTC”时间戳。

# Let's see what happens if the timezone of the yahoo.xts2 object is the same as z:
yahoo.xts2 <- xts(coredata(yahoo.xts), order.by = as.POSIXct(as.character(index(yahoo.xts)), tz = "America/Chicago"))

str(yahoo.xts2)
An ‘xts’ object on 1970-01-02/2017-10-27 containing:
    Data: num [1:12067, 1] 1.94 1.97 1.96 1.95 1.96 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "XOM.Close"
Indexed by objects of class: [POSIXct,POSIXt] TZ: America/Chicago
xts Attributes:  
    NULL


u2 <- merge(z,yahoo.xts2)
tail(u2)
class(index(u2))
# [1] "POSIXct" "POSIXt" 

tail(u2, 3)
# D.30        D.60        D.90       D.120        D.150 XOM.Close
# 2017-10-25 -0.1966798  0.05478650 0.002688985 -0.05971685  0.048184107     83.17
# 2017-10-26 -0.1903972  0.04351491 0.004933078 -0.08677770 -0.003212736     83.47
# 2017-10-27  0.9153134 -0.20238893 0.132697099  0.28339465 -0.131821053     83.71

现在一切都如预期的那样。

您可能会发现有用的快捷方式是:

z3 <- as.xts(as.data.frame(z), dateFormat="Date")
tail(merge(z3, yahoo.xts))

# D.30        D.60         D.90       D.120        D.150 XOM.Close
# 2017-10-20  0.17660154 -0.05302808  0.038288865  0.07426717 -0.010041446     83.11
# 2017-10-23 -0.91413207  0.26389589 -0.008009082 -0.33629938  0.106355724     83.24
# 2017-10-24 -0.06086527  0.10844974 -0.079842422 -0.08598157 -0.064068281     83.47
# 2017-10-25 -0.19667978  0.05478650  0.002688985 -0.05971685  0.048184107     83.17
# 2017-10-26 -0.19039716  0.04351491  0.004933078 -0.08677770 -0.003212736     83.47
# 2017-10-27  0.91531339 -0.20238893  0.132697099  0.28339465 -0.131821053     83.71

转换为data.frame,然后使用适当的参数设置转换回xts:dateFormat="Date"。现在,您正在使用xts对象,其时间索引的类型为date,没有时区问题:

class(index(merge(z3, yahoo.xts)))
#[1] "Date"