Question

我需要计算每个日期变量的成对，连续相关性（我的数据集中有246个）：

Company   2009/08/21      2009/08/24       2009/08/25
A       -0.0019531250   -0.0054602184    -6.274510e-03
AA      -0.0063291139   -0.0266457680    -1.750199e-02
AAPL     0.0084023598   -0.0055294118    -1.770643e-04 ...
...

因此我可以找到cor（col1，col2），cor（col2，col3），但没有找到cor（col1，col3）。我意识到，如果我想要所有的组合，我可以使用combn函数，但我无法弄清楚如何在没有像for循环那样低效的情况下为我的环境做这件事。

Answer 1

方法1 你可以这样做：

lapply(1:(ncol(dat)-1), function(i) cor(dat[, i], dat[, i+1],   
   use="pairwise.complete.obs"))

示例具有10个变量的数据帧将为您提供9个连续的相关性，即列1-2,3-3,3-4等，如果这是您想要的。

dat <- replicate(10, rnorm(10)) lapply(1:(ncol(dat)-1), function(i) cor(dat[, i], dat[, i+1], use="pairwise.complete.obs"))

方法2（非常简洁）

同时使用虹膜数据集：

dat <- iris[, 1:4] diag(cor(dat, use="pairwise.complete.obs")[, -1]) [1] -0.1175698 -0.4284401 0.9628654

Answer 2

正如您所指出的那样，combn是可行的方法。假设您的data.frame被称为dat，然后对于连续的列，请尝试：

ind <- combn(ncol(dat), 2)
consecutive <- ind[ , apply(ind, 2, diff)==1]
lapply(1:ncol(consecutive), function(i) cor(dat[,consecutive[,i]]))

考虑这个简单的例子：

> data(iris)
> dat <- iris[, 1:4]
> # changing colnames to see whether result is for consecutive columns
> colnames(dat) <- 1:ncol(dat)  
> head(dat)   # this is how the data looks like
    1   2   3   4
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
> 
> ind <- combn(ncol(dat), 2)
> consecutive <- ind[ , apply(ind, 2, diff)==1]
> lapply(1:ncol(consecutive), function(i) cor(dat[,consecutive[,i]])) # output: cor matrix
[[1]]
           1          2
1  1.0000000 -0.1175698
2 -0.1175698  1.0000000

[[2]]
           2          3
2  1.0000000 -0.4284401
3 -0.4284401  1.0000000

[[3]]
          3         4
3 1.0000000 0.9628654
4 0.9628654 1.0000000

如果您只想要相关性，请使用sapply

> sapply(1:ncol(consecutive), function(i) cor(dat[,consecutive[,i]])[2,1])
[1] -0.1175698 -0.4284401  0.9628654

Answer 3

通常，应该避免R中的循环，但我认为它们有时会有不应有的耻辱感。在这种情况下，循环比“冷却”功能更容易阅读。它也相当有效。任何类似cor(mydata)的调用都会计算n ^ 2个相关性，而for循环只计算n个相关性。

x = matrix( rnorm(246*20000), nrow=246 )
out = numeric(245)

system.time( { for( i in 1:245 )
                 out[i] = cor(x[,i],x[,i+1]) } )
# 0.022 Seconds

system.time( diag(cor(x, use="pairwise.complete.obs")[, -1]) )
# Goes for 2 minutes and then crashes my R session

Answer 4

首先，我假设您的数据存储在df。

这就是我要做的。首先创建一个函数，对于任何给定的列号，它将计算该列与其之间的相关性，如下所示

cor.neighbour <- function(i) {
    j <- i + 1
    cr <- cor(df[, i], df[, j])

    # returning a dataframe here will make sense when you see the results from sapply
    result <- data.frame(
        x = names(df)[i],
        y = names(df)[j],
        cor = cr,
        stringsAsFactors = FALSE
    )

    return(result)
}

然后将它应用于您的整个数据，我首先会创建一个我想要使用的所有列的向量，i顺便说一句，它只是最后一列。然后使用lapply来处理它们

i <- 1:(ncol(df) - 1)
cor.pairs <- lapply(i, cor.neighbour)
# change list in to a data frame
cor.pairs <- melt(cor.pairs, id=names(cor.pairs[[1]]))

R中的成对运算

4 个答案: