说我有两个代表随机变量的向量:
x<-rnorm(100000)
y<-rexp(100000)
使用for
循环来计算两个向量之间的相关系数的代码是什么?
我对R很陌生,所以简单的答案会更好。谢谢。
答案 0 :(得分:2)
嗨,乔,欢迎您!
您并不需要for
循环,cor.test
可以为您提供两个向量之间的相关系数。
x<-rnorm(1000)
y<-rexp(1000)
cor.test(x,y)
您将获得以下输出:
> cor.test(x,y)
Pearson's product-moment correlation
data: x and y
t = 1.5191, df = 998, p-value = 0.1291
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.01400425 0.10969698
sample estimates:
cor
0.04803053
您还可以使用ggpubr
在散点图上对其进行绘制:
library(ggpubr)
df = data.frame(x,y)
ggscatter(df, x = "x", y = "y",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson")
您将获得以下输出:
是否使用for
循环进行多次关联
# generating a dataframe with multiple vector to compare:
df = NULL
for(i in 1:5)
{
df = data.frame(cbind(df,rnorm(1000)))
}
# Testing the correlation between columns 1 and all other columns using for loop
for(i in 1:ncol(df))
{
if(i==1){correlation = cor(df[,1],df[,i])}
else{correlation = c(correlation, cor(df[,1],df[,i]))}
}
> correlation
[1] 1.00000000 -0.05276680 -0.03968104 -0.02960876 0.01861618
# Using apply
correlation = as.vector(apply(df,2,function(x){cor(x,df[,1])}))
> correlation
[1] 1.00000000 -0.05276680 -0.03968104 -0.02960876 0.01861618
答案 1 :(得分:2)
没有理由为什么任何人都需要下面的功能。这只是R的算术运算中实现的Pearson相关公式。
cor2 <- function(x, y, na.rm = FALSE){
if(na.rm){
x <- x[!is.na(x)]
y <- y[!is.na(y)]
}
stopifnot(length(x) == length(y))
n <- length(x)
sum.x <- 0
sum.y <- 0
sum.x2 <- 0
sum.y2 <- 0
sum.xy <- 0
for(i in seq_along(x)){
sum.x <- sum.x + x[i]
sum.y <- sum.y + y[i]
sum.x2 <- sum.x2 + x[i]^2
sum.y2 <- sum.y2 + y[i]^2
sum.xy <- sum.xy + x[i]*y[i]
}
numer <- n*sum.xy - sum.x*sum.y
denom <- sqrt(n*sum.x2 - sum.x^2)*sqrt(n*sum.y2 - sum.y^2)
numer/denom
}
set.seed(1234)
x <- rnorm(20)
y <- rexp(20)
cor(x, y)
#[1] -0.07445358
cor2(x, y)
#[1] -0.07445358
这些结果不是identical()
。
identical(cor(x, y),cor2(x, y))
#[1] FALSE
all.equal(cor(x, y),cor2(x, y))
#[1] TRUE
cor(x, y) - cor2(x, y)
#[1] -4.163336e-17
但是内置功能要快得多。
x <- rnorm(1000)
y <- rexp(1000)
microbenchmark::microbenchmark(
base = cor(x, y),
rui = cor2(x, y)
)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# base 40.936 42.797 45.54924 43.482 44.8155 101.172 100 a
# rui 479.102 481.738 496.01586 483.190 491.5015 690.619 100 b