在一个data.table中与多个组进行互相关

时间:2014-04-11 01:03:58

标签: r time-series data.table

我想计算data.table中时间序列组之间的互相关。我有这种格式的时间序列数据:

data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5)) , Y = rnorm(15) )

   group           Y
 1:    a  0.90855520
 2:    a -0.12463737
 3:    a -0.45754652
 4:    a  0.65789709
 5:    a  1.27632196
 6:    b  0.98483700
 7:    b -0.44282527
 8:    b -0.93169070
 9:    b -0.21878359
10:    b -0.46713392
11:    c -0.02199363
12:    c -0.67125826
13:    c  0.29263953
14:    c -0.65064603
15:    c -1.41143837

每组观察次数相同。我正在寻找的是一种获得各组之间互相关的方法:

group.1   group.2    correlation
      a         b          0.xxx
      a         c          0.xxx
      b         c          0.xxx

我正在编写一个脚本来对每个组进行子集化并附加交叉相关,但数据大小相当大。有没有有效/禅的方法来做到这一点?

1 个答案:

答案 0 :(得分:4)

这有帮助吗?

data[,id:=rep(1:5,3)]
dtw  = dcast.data.table(data, id ~ group, value.var="Y" )[, id := NULL]
cor(dtw)

请参阅Correlation between groups in R data.table


另一种方式是:

# data
set.seed(45L)
data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5)) , Y = rnorm(15) )

# method 2
setkey(data, "group")
data2 = data[J(c("b", "c", "a"))][, list(group2=group, Y2=Y)]
data[, c(names(data2)) := data2]

data[, cor(Y, Y2), by=list(group, group2)]

#     group group2         V1
# 1:      a      b -0.2997090
# 2:      b      c  0.6427463
# 3:      c      a -0.6922734

并概括这个"其他"超过三组的方式......

data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5),rep("d",5)) ,
                   Y = rnorm(20) )
setkey(data, "group")

groups = unique(data$group)
ngroups = length(groups)
library(gtools)
pairs = combinations(ngroups,2,groups)

d1 = data[pairs[,1],,allow.cartesian=TRUE]
d2 = data[pairs[,2],,allow.cartesian=TRUE]
d1[,c("group2","Y2"):=d2]
d1[,cor(Y,Y2), by=list(group,group2)]
#    group group2          V1
# 1:     a      b  0.10742799
# 2:     a      c  0.52823511
# 3:     a      d  0.04424170
# 4:     b      c  0.65407400
# 5:     b      d  0.32777779
# 6:     c      d -0.02425053