条件计算相关性

时间:2014-10-26 22:29:43

标签: r data.table aggregate apply correlation

我有一个看起来像这样的表:

Year Class Value1  Value2  Value3
2006    A    45      27      96
2007    A    74      45      26
2008    C    74      41      78 
2009    D    56      65      45
2010    C    12      14      15
2011    A    25      85      50 
2012    B    26      45      12
2013    C    15      23      29
2014    D    86      36      53

如何找到Value1和Value2之间的相关性;所有行的Value1和Value3?

我试图为Value1和Value2执行此操作:

cor <- data[,list(correlation=cor(Value1,Value2)),by=list(Year, Class)]

但是得到错误:

Error in `[.data.frame`(data, , list(correlation = cor(Value1, Value2)),  : 
  unused argument (by = list(Year, Class))

1 个答案:

答案 0 :(得分:1)

这是一种返回列表的方法,其中每个列表元素是给定值Class的相关矩阵。假设您问题中的表格是名为dat的数据框:

改编自this CrossValidated answer

library(plyr)

corrFunc <- function(dat) {
  return(data.frame(cor(dat[,-c(1,2)])))
}

corr.list = dlply(dat, .(Class), corrFunc)

这是输出的样子:

$A
           Value1     Value2     Value3
Value1  1.0000000 -0.5920024 -0.4347386
Value2 -0.5920024  1.0000000 -0.4684250
Value3 -0.4347386 -0.4684250  1.0000000

$B
       Value1 Value2 Value3
Value1     NA     NA     NA
Value2     NA     NA     NA
Value3     NA     NA     NA

$C
          Value1    Value2    Value3
Value1 1.0000000 0.9580847 0.9855342
Value2 0.9580847 1.0000000 0.9927778
Value3 0.9855342 0.9927778 1.0000000

$D
       Value1 Value2 Value3
Value1      1     -1      1
Value2     -1      1     -1
Value3      1     -1      1

attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  Class
1     A
2     B
3     C
4     D