我正在data.frame
创建一个卡方检验,用于调用具有两个二进制变量和13109 obs的{1}}。
我在根据受众特征对消费者进行聚类之前使用该测试。如果两个变量彼此依赖,则某些值将在集群中。这两个变量是另一个data.frame
的子集,包含36个变量。
我收到一条错误消息,指出data.frame
有character
个变量,而不是factors
函数显示的str()
。
为什么错误表明data.frame
有character
个值?
数据:
> str(Comp1)
'data.frame': 13109 obs. of 2 variables:
$ HomeOwnerStatus: Factor w/ 2 levels "Own","Rent": 1 2 2 2 1 2 1 1 2 2 ...
$ MaritalStatus : Factor w/ 2 levels "Married","Single": 2 1 1 1 2 1 2 1 1 1 ...
示例:
> #Create dataset
> homeownerstatus <- c("Own", "Rent", "Own", "Own", "Rent", "Own")
> maritalstatus <- c("Married", "Married", "Married", "Single", "Single", "Married")
> Comp1 <- data.frame(homeownerstatus, maritalstatus)
解决方案错误:
> #Test binary variables for independence
> #Create matrix from data.frame
> DF4 <- as.matrix(Comp1)
> #Comparison of marital status and home owner status
> #Perform chi-squared test for independence of two variables
> chisq.test(table(Comp1))
Chi-squared test for given probabilities
data: table(DF4)
X-squared = 295149.5, df = 71, p-value < 2.2e-16
答案 0 :(得分:1)
chisq.test
想要 的因子向量 其x
和对于y
参数,matrix
参数 或 data.frame
或x
。传递data.frame
时,会通过函数matrix
将其转换为as.matrix
。此步骤会将data.frame
中的因子列强制转换为字符。
> as.matrix(Comp1)
homeownerstatus maritalstatus
[1,] "Own" "Married"
[2,] "Rent" "Married"
[3,] "Own" "Married"
[4,] "Own" "Single"
[5,] "Rent" "Single"
[6,] "Own" "Married"
所以,我的建议是通过两个因子向量:
chisq.test(Comp1$homeownerstatus, Comp1$maritalstatus)
Pearson's Chi-squared test with Yates' continuity correction
data: Comp1$homeownerstatus and Comp1$maritalstatus
X-squared = 0, df = 1, p-value = 1
Warning message:
In chisq.test(Comp1$homeownerstatus, Comp1$maritalstatus) :
Chi-squared approximation may be incorrect
修改强>
当您将矩阵或data.frame传递给x
参数时,该对象将被视为列联表,这不是您想要的。你有两个二进制变量应该计算应变表,然后根据卡方检验进行测试。因此,您应该如上所述传递每个因子向量,或者,计算列联表并将其传递给chisq.test
。
chisq.test(table(Comp1))