在R中的cor.test中完成时,子集不起作用

时间:2016-08-02 01:37:06

标签: r subset correlation

我有一个包含3列的df:

  • column_1:数字
  • column_2:数字
  • column_3:具有两个组的因子变量,A和B

我想计算第1列和第2列之间的Spearman相关性测试,但仅在组之间进行计算(因此仅在第1列和第2列的匹配组A的观察值之间计算相关性,同样适用于组B)。 所以我正在使用这些代码行:

cor.test(df$column_1, df$column_2, alternative = ("two.sided"), 
     subset(df, column_3==c("group_A")),
     data = df, method = c("spearm"))
cor.test(df$column_1, df$column_2, alternative = ("two.sided"), 
         subset(df, column_3==c("group_B")),
         data = df, method = c("spearm"))

事实是,我在两个测试中都得到了相同的结果,所以我猜子集函数不起作用,因为如果我之前对子集进行了子集,就像这样:

x <- subset(df, column_3==c("group_A"))
y <- subset(df, column_3==c("group_B"))

然后分别在x和y上运行cor.test,得到不同的结果。有人知道发生了什么吗?

PS:我收到以下警告,但我不认为这与我提出的问题有关:

Warning message:
"In cor.test.default(cor_itir$Nart, cor_itir$Medida, alternative = "two.sided",  :cannot compute exact p-value with ties"

2 个答案:

答案 0 :(得分:2)

通过使用df$...提取器并指定data=并使用subset()作为独立函数,您过度复杂化了一些事情。您可以使用以下内容获得相同的结果:

# here's some example data with different correlations between each group
df <- data.frame(column_1=1:10,column_2=c(1:5,6,4,3,11,9),column_3=rep(c("a","b"),each=5))

然后只需指定您的论坛,您的data=subset=内联:

cor.test(~ column_1 + column_2, alternative="two.sided", data=df, subset=(column_3=="a"))

cor.test(~ column_1 + column_2, alternative="two.sided", data=df, subset=(column_3=="b"))

或者一次性使用by

by(df, df$column_3, FUN = function(x) cor.test(~ column_1 + column_2, data = x))

答案 1 :(得分:0)

使用withsubset

with(subset(df, column_3==c("group_A")),
     cor.test(column_1, column_2, alternative = ("two.sided"), 
     method = c("spearm")))

with(subset(df, column_3==c("group_B")),
     cor.test(column_1, column_2, alternative = ("two.sided"), 
              method = c("spearm")))

修改

添加数据

df <- data.frame(column_1=1:10,column_2=c(1:5,6,4,3,11,9),column_3=rep(c("group_A","group_B"),each=5))

> with(subset(df, column_3==c("group_A")),
+      cor.test(column_1, column_2, alternative = ("two.sided"), 
+               method = c("spearman")))

    Spearman's rank correlation rho

data:  column_1 and column_2
S = 4.4409e-15, p-value = 0.01667
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho 
  1 


> with(subset(df, column_3==c("group_B")),
+      cor.test(column_1, column_2, alternative = ("two.sided"), 
+               method = c("spearman")))

    Spearman's rank correlation rho

data:  column_1 and column_2
S = 10, p-value = 0.45
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho 
0.5