我正在尝试对不同的数据帧进行相关性测试,这些数据帧表示分配给员工的唯一商店的数量,并且各列同时悔改不同的区域。我的数据框按每个员工拥有的唯一存储的数量进行划分:
unique_store_breakdown <- split(Data, as.factor(Data$unique_stores))
理想情况下,我想要输出:
Region -- unique_store -- correlation
Midwest ------- 1 -------------- .05
Midwest ------- 2 -------------- .04
.
.
Southeast ----- 1 ------------- 0.75
.
.
cor_tests <-list()
counter = 0
for (i in unique(j$region)){
for (j in 1: length(unique_store_breakdown)){
counter = counter + 1
#Create new variables for correlation test
x = as.numeric(j[j$region == i,]$quality)
y = as.numeric(j[j$region == i,]$rsv)
cor_tests[[counter]] <- cor.test(x,y)
}}
cor_tests
我能够一次对一个数据帧运行此命令,但是当我尝试添加嵌套循环(j项)时,出现错误“错误:$运算符对原子向量无效。此外,我还想将结果作为数据框而不是列表输出。
答案 0 :(得分:0)
如果您要做的只是为每个商店执行cor.test()
,则使用by()
应该非常简单。 by()
的输出是一个常规列表,只是打印有些特殊。
# example data
set.seed(1)
dtf <- data.frame(store=rep(1:3, each=30), rsv=rnorm(90))
dtf$quality <- dtf$rsv + rnorm(90, 0, dtf$store)
# perform cor.test for every store
by(dtf, dtf$store, function(x) cor.test(x$quality, x$rsv))
# dtf$store: 1
#
# Pearson's product-moment correlation
#
# data: x$quality and x$rsv
# t = 5.5485, df = 28, p-value = 6.208e-06
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
# 0.4915547 0.8597796
# sample estimates:
# cor
# 0.7236681
#
# ------------------------------------------------------------------------------
# dtf$store: 2
#
# Pearson's product-moment correlation
#
# data: x$quality and x$rsv
# t = 0.68014, df = 28, p-value = 0.502
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
# -0.2439893 0.4663368
# sample estimates:
# cor
# 0.1274862
#
# ------------------------------------------------------------------------------
# dtf$store: 3
#
# Pearson's product-moment correlation
#
# data: x$quality and x$rsv
# t = 2.2899, df = 28, p-value = 0.02977
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
# 0.04304952 0.66261810
# sample estimates:
# cor
# 0.397159
#