我需要正确地解释我的相关性测试结果,并查看它们是否与在后续步骤中执行的二进制分类的结果一致。
我正在尝试测试两个变量(来自NLP域)之间的相关性,下面是我的R代码。
displaydata = data.frame(game = as.character(Star_Ratings[,1]), mean_scores = as.matrix(universal.data$score), reviews = as.matrix(Star_Ratings[,2])) # for visualization only.
# Visualization
cordata = data.frame(x = displaydata$mean_scores, y = displaydata$reviews)
ggscatter(cordata, x = "x", y = "y",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "points", ylab = "out of 5 stars")
# Correlation
x <- displaydata[["mean_scores"]]
y <- displaydata[["reviews"]]
result <- cor.test(x,y,method = "pearson")
result
Pearson's product-moment correlation
data: x and y
t = 0.8101, df = 48, p-value = 0.4219
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.1676306 0.3821309
sample estimates:
cor
0.116136