对两列(x,y)中包含的所有数据对执行Spearman Correlation?

时间:2016-08-17 23:07:05

标签: r correlation

我的.csv格式的数据如下所示:

sampleid    blue            red             otuid
AB1      0.001020366       0.000262013      K00001
AB1      7.24E-05          0.00000307       K00002
AB1      0.000500854       0.000635104      K00003
AB1      3.50E-05          0.000000555      K00004
AB1      0.000196537       0.0000346        K00005
AB1      2.56E-05          2.92E-08         K00006
AB1      0.00027525        0.0000392        K00007
AB1      0.000177602       0.000000994      K00008
AB1      0.000128098       0.000151901      K00009
AB1      1.46E-06          0.000000468      K00010
AB1      0.000348187       0.000571836      K00011
AB1      0.000448518       0.000435364      K00012
AB1      0.000490293       0.000729903      K00013
AB1      0.000263668       0.00000567       K00014
AB1      0.00054961        0.000406697      K00015
AB2      0.001020366       0.000262013      K00001
AB2      7.24E-05          0.00000307       K00002
AB2      0.000500854       0.000635104      K00003
AB2      3.50E-05          0.000000555      K00004
AB2      0.000196537       0.0000346        K00005
AB2      2.56E-05          2.92E-08         K00006
AB2      0.00027525        0.0000392        K00007
AB2      0.000177602       0.000000994      K00008
AB2      0.000128098       0.000151901      K00009
AB2      1.46E-06          0.000000468      K00010
AB2      0.000348187       0.000571836      K00011
AB2      0.000448518       0.000435364      K00012
AB2      0.000490293       0.000729903      K00013
AB2      0.000263668       0.00000567       K00014
AB2      0.00054961        0.000406697      K00015

当我这样运行cor()时:

d <- read.csv("name.csv")
cor(rank(test[,3]),rank(test[,4])
[1] 0.777888

我假设这是所有相关性测试的平均R,但我更愿意,如果我可以获得每个样本的单个R /每个测试的OTU(X对Y),这样我就可以编写一个看起来像这样的表:

otuid sampleid Spearman's R
k00001 Sample1  0.001
k00002 Sample1  0.012
k00003 Sample1  0.013
k00004 Sample1  0.015 ......

k00001 Sample2 0.001
k00002 Sample2  0.012
k00003 Sample2  0.013
k00004 Sample2  0.015

感谢您的帮助!

Data.frame加快了这一步:

sampleid = c("AB1","AB1","AB1","AB1","AB1","AB1","AB1","AB1","AB1",
"AB1","AB1","AB1","AB1","AB1","AB2","AB2","AB2","AB2","AB2","AB2","AB2",
"AB2","AB2","AB2","AB2","AB2","AB2","AB2","AB2","AB2") 
red = c(runif(30,0,100))
blue = c(runif(30,0,100)) 
otuid =c("K00001","K00002","K00003","K00004","K00005","K00006",
"K00007","K00008","K00009","K00010","K00011","K00012",
"K00013","K00014","K00015","K00001","K00002","K00003","K00004",
"K00005","K00006","K00007","K00008","K00009","K00010",
"K00011","K00012","K00013","K00014","K00015")
 df = data.frame(sampleid, red, blue,otuid)
df
print(p)

1 个答案:

答案 0 :(得分:1)

根据您的评论并使用您提供的数据框,您可以使用purrr软件包计算每个样本中的相关性,如下所示:

library(purrr)

df %>% 
  split(.$sampleid) %>% 
  map_dbl(~ cor(.$blue, .$red))
#>        AB1        AB2 
#> 0.07714403 0.38077482

这是获得类似内容的基本R方式:

by(df, df$sampleid, function(x) cor(x$blue, x$red))
#> df$sampleid: AB1
#> [1] 0.205726
#> -------------------------------------------------------- 
#> df$sampleid: AB2
#> [1] 0.3237938