我的.csv格式的数据如下所示:
sampleid blue red otuid
AB1 0.001020366 0.000262013 K00001
AB1 7.24E-05 0.00000307 K00002
AB1 0.000500854 0.000635104 K00003
AB1 3.50E-05 0.000000555 K00004
AB1 0.000196537 0.0000346 K00005
AB1 2.56E-05 2.92E-08 K00006
AB1 0.00027525 0.0000392 K00007
AB1 0.000177602 0.000000994 K00008
AB1 0.000128098 0.000151901 K00009
AB1 1.46E-06 0.000000468 K00010
AB1 0.000348187 0.000571836 K00011
AB1 0.000448518 0.000435364 K00012
AB1 0.000490293 0.000729903 K00013
AB1 0.000263668 0.00000567 K00014
AB1 0.00054961 0.000406697 K00015
AB2 0.001020366 0.000262013 K00001
AB2 7.24E-05 0.00000307 K00002
AB2 0.000500854 0.000635104 K00003
AB2 3.50E-05 0.000000555 K00004
AB2 0.000196537 0.0000346 K00005
AB2 2.56E-05 2.92E-08 K00006
AB2 0.00027525 0.0000392 K00007
AB2 0.000177602 0.000000994 K00008
AB2 0.000128098 0.000151901 K00009
AB2 1.46E-06 0.000000468 K00010
AB2 0.000348187 0.000571836 K00011
AB2 0.000448518 0.000435364 K00012
AB2 0.000490293 0.000729903 K00013
AB2 0.000263668 0.00000567 K00014
AB2 0.00054961 0.000406697 K00015
当我这样运行cor()时:
d <- read.csv("name.csv")
cor(rank(test[,3]),rank(test[,4])
[1] 0.777888
我假设这是所有相关性测试的平均R,但我更愿意,如果我可以获得每个样本的单个R /每个测试的OTU(X对Y),这样我就可以编写一个看起来像这样的表:
otuid sampleid Spearman's R
k00001 Sample1 0.001
k00002 Sample1 0.012
k00003 Sample1 0.013
k00004 Sample1 0.015 ......
k00001 Sample2 0.001
k00002 Sample2 0.012
k00003 Sample2 0.013
k00004 Sample2 0.015
感谢您的帮助!
Data.frame加快了这一步:
sampleid = c("AB1","AB1","AB1","AB1","AB1","AB1","AB1","AB1","AB1",
"AB1","AB1","AB1","AB1","AB1","AB2","AB2","AB2","AB2","AB2","AB2","AB2",
"AB2","AB2","AB2","AB2","AB2","AB2","AB2","AB2","AB2")
red = c(runif(30,0,100))
blue = c(runif(30,0,100))
otuid =c("K00001","K00002","K00003","K00004","K00005","K00006",
"K00007","K00008","K00009","K00010","K00011","K00012",
"K00013","K00014","K00015","K00001","K00002","K00003","K00004",
"K00005","K00006","K00007","K00008","K00009","K00010",
"K00011","K00012","K00013","K00014","K00015")
df = data.frame(sampleid, red, blue,otuid)
df
print(p)
答案 0 :(得分:1)
根据您的评论并使用您提供的数据框,您可以使用purrr软件包计算每个样本中的相关性,如下所示:
library(purrr)
df %>%
split(.$sampleid) %>%
map_dbl(~ cor(.$blue, .$red))
#> AB1 AB2
#> 0.07714403 0.38077482
这是获得类似内容的基本R方式:
by(df, df$sampleid, function(x) cor(x$blue, x$red))
#> df$sampleid: AB1
#> [1] 0.205726
#> --------------------------------------------------------
#> df$sampleid: AB2
#> [1] 0.3237938