为什么我在cor()和ccf()之间得到不同的相关结果?
library(xts)
> set.seed(123)
> ts1 = xts(1:100, as.POSIXlt(1366039619, tz="", origin="1970-01-01") + rnorm(100, 0, 3))
> ts2 = xts(1:100, as.POSIXlt(1366039619, tz="", origin="1970-01-01") + rnorm(100, 0, 3))
> as.vector(ccf(as.integer(ts1[,1]), as.integer(ts2[,1]), lag.max =10, plot =F, na.action=na.pass)$acf)
[1] -0.13747975 -0.00747975 -0.09497750 -0.01031203 -0.07564956 0.19881488 -0.11353135 0.01673867 0.12900690 0.00059706 -0.09642964 0.20852985 0.02476448 0.00126913 -0.03467147 -0.04284728 -0.05561356
[18] 0.08875188 0.01587159 -0.04449745 0.01002100
> sapply(seq(-10, 10), function(x, ts1, ts2) { cor(ts1[,1], lag(ts2[,1], x), use="complete.obs") }, ts1, ts2)
[1] -0.154055651 -0.008411318 -0.104222576 -0.011595184 -0.082495425 0.210464976 -0.118454928 0.018112365 0.132716811 0.000694595 -0.096429643 0.209312640 0.025156993 0.001450175 -0.035451383
[16] -0.043902825 -0.057842616 0.093863686 0.017485161 -0.047042779 0.011511559
> sapply(seq(-10, 10), function(x, ts1, ts2) { cor(ts1[,1], lag(ts2[,1], x), use="complete.obs") }, ts1, ts2) - as.vector(ccf(as.integer(ts1[,1]), as.integer(ts2[,1]), lag.max =10, plot =F, na.action=na.pass)$acf)
[1] -0.0165759032546357876203 -0.0009315701778466996610 -0.0092450780124607306876 -0.0012831523310935632337 -0.0068458595845764941279 0.0116500945970494651505 -0.0049235745757881255180
[8] 0.0013736907995123247284 0.0037099107611970050247 0.0000975349354166987759 -0.0000000000000000277556 0.0007827869094209904954 0.0003925162566637135919 0.0001810479989895477041
[15] -0.0007799161627975795263 -0.0010555407353524254299 -0.0022290547145371181204 0.0051118107350296843050 0.0016135741880074876142 -0.0025453295798825298357 0.0014905566679348520448
更新
由于ccf()使用acf(),因此差异可以减少为:
> as.vector(acf(c(42, 5, 65437, 23), plot=F, lag.max=1)$acf)
[1] 1.000000 -0.416954
> cor(c(42, 5, 65437, 23), c(NA, 42, 5, 65437), use="pairwise.complete.obs")
[1] -0.500218
> cor(c(42, 5, 65437, 23), c(5, 65437, 23, NA), use="pairwise.complete.obs")
[1] -0.500218
答案 0 :(得分:9)
您的示例中cor
和acf
之间存在一些差异。让我们选择一个更易于管理(并已经贬低)的例子:
x = c(-2,-1,0,1,2)
acf(x, plot = F, lag.max = NULL)
# Autocorrelations of series ‘x’, by lag
# 0 1 2 3 4
# 1.0 0.4 -0.1 -0.4 -0.4
以下是acf
到达此处的方式,例如lag=2
:
acf_lag_2 = sum(x*c(x[c(-1,-2)],NA,NA), na.rm = T) /
sqrt(sum(x*x)*sum(x*x))
将此与cor
构造的作用进行对比:
cor(x, c(0,1,2,NA,NA), use="pairwise.complete.obs") # = cor(c(-2,-1,0), c(0,1,2)) = 1
cor_lag_2 = sum((c(-2,-1,0)+1)*(c(0,1,2)-1)) / # recall cor needs to demean both vectors
sqrt(sum(c(-1,0,1)*c(-1,0,1))*sum(c(-1,0,1)*c(-1,0,1)))
因此acf
在一开始只贬值一次,并在整个过程中使用它进行规范化,而cor
会针对每个滞后分别进行规范化和贬值。