我的问题涉及计算Cramers V以检测分类变量之间的相关性。我有一个缺少值的数据集,但是我创建了一个假数据集,用于说明两个变量a和b,其中一个变量包含NA。
a <- factor(c("M","","F","F","","M","F","F"))
a2 <- factor(a, levels = c('M','F'),labels =c('Male','Female'))
b <- factor(c("y","y","","y","n","n","n","y"))
b2 <- factor(b, levels=c("y","n"),labels=c("yes","no"))
df<-cbind(a2,b2)
assocstats函数给出了cramers V的结果:
require(vcd)
> tab <-table(a,b)
> assocstats(tab)
X^2 df P(> X^2)
Likelihood Ratio 1.7261 4 0.78597
Pearson 1.3333 4 0.85570
Phi-Coefficient : 0.408
Contingency Coeff.: 0.378
Cramer's V : 0.289
现在我想从级别
中删除NA a[a==""]<-NA
a3 <- droplevels(a)
levels(a3)
tab <-table(a,b)
assocstats(tab)
但每次我删除NA时,结果都是这样的:
X^2 df P(> X^2)
Likelihood Ratio 0.13844 2 0.93312
Pearson NaN 2 NaN
Phi-Coefficient : NaN
Contingency Coeff.: NaN
Cramer's V : NaN
另外,因为我有一个大型数据集,我想计算Cramer V结果的矩阵。我发现这个代码在堆栈溢出,它似乎工作...
get.V<-function(y){
col.y<-ncol(y)
V<-matrix(ncol=col.y,nrow=col.y)
for(i in 1:col.y){
for(j in 1:col.y){
V[i,j]<-assocstats(table(y[,i],y[,j]))$cramer
}
}
return(V)
}
get.V(tab)
只有结果与使用assocstats函数的结果不同:
[,1] [,2] [,3]
[1,] 1.0 0.5 1
[2,] 0.5 1.0 1
[3,] 1.0 1.0 1
这可能不对,因为我每次都会得到这个结果,即使改变了观察次数......这段代码有什么问题?
结论:我不知道哪一个结果是对的。我有一个包含大量NA的大型数据集。第一个asocstat结果和代码给出不同的结果,尽管没有太大的区别,因为代码只创建一个矩阵。第二个asocstat函数只给出NaN.I无法检测到任何错误......有人可以帮助我吗?
答案 0 :(得分:0)
如果您使用因素,则不必将""
替换为NA
- 您在levels
中未定义的任何唯一值都将是NA
factor
a <- factor(c("M","","F","F","","M","F","F"))
a2 <- factor(a, levels = c('M','F'),labels =c('Male','Female'))
a
# [1] M F F M F F
# Levels: F M
a2
# [1] Male <NA> Female Female <NA> Male Female Female
# Levels: Male Female
b <- factor(c("y","y","","y","n","n","n","y"))
b2 <- factor(b, levels=c("y","n"),labels=c("yes","no"))
(df <- cbind(a2,b2))
# a2 b2
# [1,] 1 1
# [2,] NA 1
# [3,] 2 NA
# [4,] 2 1
# [5,] NA 2
# [6,] 1 2
# [7,] 2 2
# [8,] 2 1
在上方,您正在创建一个矩阵,该矩阵会丢失您使用factor
创建的所有标签。我想你想要一个数据框:
(df <- data.frame(a2,b2))
# a2 b2
# 1 Male yes
# 2 <NA> yes
# 3 Female <NA>
# 4 Female yes
# 5 <NA> no
# 6 Male no
# 7 Female no
# 8 Female yes
require('vcd')
(tab <- table(a2,b2, useNA = 'ifany'))
# b2
# a2 yes no <NA>
# Male 1 1 0
# Female 2 1 1
# <NA> 1 1 0
(tab <- table(a2,b2))
# b2
# a2 yes no
# Male 1 1
# Female 2 1
如果要在表格中查看table
值,则需要明确告诉NA
。否则,它将默认删除它们,以便您已经&#34;排除&#34;使用assocstats
:
assocstats(tab)
# X^2 df P(> X^2)
# Likelihood Ratio 0.13844 1 0.70983
# Pearson 0.13889 1 0.70939
#
# Phi-Coefficient : 0.167
# Contingency Coeff.: 0.164
# Cramer's V : 0.167
对于get.V
,只传递数据框或矩阵,而不是表:
get.V <- function(y) {
col.y <- ncol(y)
V <- matrix(ncol=col.y,nrow=col.y)
for(i in 1:col.y){
for(j in 1:col.y){
V[i,j] <- assocstats(table(y[,i],y[,j]))$cramer
}
}
return(V)
}
get.V(df)
# [,1] [,2]
# [1,] 1.0000000 0.1666667
# [2,] 0.1666667 1.0000000