Question

我有一组数据，例如;

       name      Exp1Res1   Exp1Res2   Exp1Res3   ExpRes1   Exp2Res2   Exp3Res3

[1]     ID1         5          7            9          7          9       2 

[2]     ID2         6          4            2          9          5       1

[3]     ID3         4          9            9          9          11      2

我需要确定每行的实验1和2之间的相关性。由于我的数据集（FullSet）实际上有37列和100,000行，我原来的循环解决方案太慢了（参见下文），所以我想进行优化。

我原来的解决方案是;

df <- data.frame(matrix(ncol = 5, nrow = dim(FullSet)[1]))
names(df)<-c("ID","pearson","spearman")
for (i in  seq(1, dim(FullSet)[1]))  
{
    pears=cor(as.numeric(t(FullSet[i,2:19])),as.numeric(t(FullSet[i,20:37])), method="pearson")
    spear=cor(as.numeric(t(FullSet[i,2:19])),as.numeric(t(FullSet[i,20:37])), method="pearson")
    df[i,]<-c(FullSet[i,1],pears,spear)
}

我觉得这样的事情应该有效;

FullSet$pearson<-cor(as.numeric(t(FullSet[,2:19])),as.numeric(t(FullSet[,20:37])), method="pearson")

但我不知道是否/如何引用转置中的当前行 -

 t(FullSet[,2:19]) - which should read something like t(FullSet[<currow>,2:19]).

帮助将不胜感激 - 我不知道我的方法是否正确。

输出应该如下（结果不正确 - 仅作为示例）

       name      Pearson     Spearman

[1]     ID1         0.8          .75 

[2]     ID2         0.9          .8

[3]     ID3         0.85         .7

Answer 1

如何将其改为格式：

ID  EXP  Res
1    1    .
1    1    .
1    2    .
1    2    .

使用reshape，然后让plyr完成工作：

require(plyr)
ddply(df, .(ID, EXP), summarize, cor(...))

那会有可能吗？如果你单独为spearman和perason做这件事。

R - 列子集之间的相关性 - 参考当前行

1 个答案: