在R中创建chisq.test()时出错 - 无效'类型'论证的(性格)

时间:2014-11-04 05:17:28

标签: r chi-squared

我正在data.frame创建一个卡方检验,用于调用具有两个二进制变量和13109 obs的{1}}。

我在根据受众特征对消费者进行聚类之前使用该测试。如果两个变量彼此依赖,则某些值将在集群中。这两个变量是另一个data.frame的子集,包含36个变量。

我收到一条错误消息,指出data.framecharacter个变量,而不是factors函数显示的str()

为什么错误表明data.framecharacter个值?

数据:

> str(Comp1)
'data.frame':   13109 obs. of  2 variables:
 $ HomeOwnerStatus: Factor w/ 2 levels "Own","Rent": 1 2 2 2 1 2 1 1 2 2 ...
 $ MaritalStatus  : Factor w/ 2 levels "Married","Single": 2 1 1 1 2 1 2 1 1 1 ...

示例:

> #Create dataset
> homeownerstatus <- c("Own", "Rent", "Own", "Own", "Rent", "Own")
> maritalstatus <- c("Married", "Married", "Married", "Single", "Single", "Married")
> Comp1 <- data.frame(homeownerstatus, maritalstatus)

解决方案错误:

> #Test binary variables for independence 
> #Create matrix from data.frame
> DF4 <- as.matrix(Comp1)
> #Comparison of marital status and home owner status
> #Perform chi-squared test for independence of two variables
> chisq.test(table(Comp1))

    Chi-squared test for given probabilities

data:  table(DF4)
X-squared = 295149.5, df = 71, p-value < 2.2e-16

1 个答案:

答案 0 :(得分:1)

chisq.test 想要 的因子向量 x和对于y参数,matrix参数 data.framex。传递data.frame时,会通过函数matrix将其转换为as.matrix。此步骤会将data.frame中的因子列强制转换为字符。

> as.matrix(Comp1)
     homeownerstatus maritalstatus
[1,] "Own"           "Married"    
[2,] "Rent"          "Married"    
[3,] "Own"           "Married"    
[4,] "Own"           "Single"     
[5,] "Rent"          "Single"     
[6,] "Own"           "Married"

所以,我的建议是通过两个因子向量:

chisq.test(Comp1$homeownerstatus, Comp1$maritalstatus)

        Pearson's Chi-squared test with Yates' continuity correction

data:  Comp1$homeownerstatus and Comp1$maritalstatus
X-squared = 0, df = 1, p-value = 1

Warning message:
In chisq.test(Comp1$homeownerstatus, Comp1$maritalstatus) :
  Chi-squared approximation may be incorrect

修改

当您将矩阵或data.frame传递给x参数时,该对象将被视为列联表,这不是您想要的。你有两个二进制变量应该计算应变表,然后根据卡方检验进行测试。因此,您应该如上所述传递每个因子向量,或者,计算列联表并将其传递给chisq.test

chisq.test(table(Comp1))