对于1,000,000次观察,我观察到一个离散事件X,对照组为3次,对于测试组为10次。
我需要在Matlab中进行Chi square独立测试。这就是你在r:
中的表现m <- rbind(c(3, 1000000-3), c(10, 1000000-10))
# [,1] [,2]
# [1,] 3 999997
# [2,] 10 999990
chisq.test(m)
r函数返回chi-squared = 2.7692,df = 1,p-value = 0.0961。
我应该使用或创建哪些Matlab函数来执行此操作?
答案 0 :(得分:14)
这是我自己使用的实现:
function [hNull pValue X2] = ChiSquareTest(o, alpha)
%# CHISQUARETEST Pearson's Chi-Square test of independence
%#
%# @param o The Contignecy Table of the joint frequencies
%# of the two events (attributes)
%# @param alpha Significance level for the test
%# @return hNull hNull = 1: null hypothesis accepted (independent)
%# hNull = 0: null hypothesis rejected (dependent)
%# @return pValue The p-value of the test (the prob of obtaining
%# the observed frequencies under hNull)
%# @return X2 The value for the chi square statistic
%#
%# o: observed frequency
%# e: expected frequency
%# dof: degree of freedom
[r c] = size(o);
dof = (r-1)*(c-1);
%# e = (count(A=ai)*count(B=bi)) / N
e = sum(o,2)*sum(o,1) / sum(o(:));
%# [ sum_r [ sum_c ((o_ij-e_ij)^2/e_ij) ] ]
X2 = sum(sum( (o-e).^2 ./ e ));
%# p-value needed to reject hNull at the significance level with dof
pValue = 1 - chi2cdf(X2, dof);
hNull = (pValue > alpha);
%# X2 value needed to reject hNull at the significance level with dof
%#X2table = chi2inv(1-alpha, dof);
%#hNull = (X2table > X2);
end
举例说明:
t = [3 999997 ; 10 999990]
[hNull pVal X2] = ChiSquareTest(t, 0.05)
hNull =
1
pVal =
0.052203
X2 =
3.7693
请注意,结果与您的结果不同,因为chisq.test
根据?chisq.test
正确:逻辑表明是否 应用连续性校正 计算2x2表的测试统计量时:一半是 从所有| O - E |中减去差异。
或者,如果您对所讨论的两个事件有实际观察结果,则可以使用计算列联表的CROSSTAB函数并返回Chi2和p值度量:
X = randi([1 2],[1000 1]);
Y = randi([1 2],[1000 1]);
[t X2 pVal] = crosstab(X,Y)
t =
229 247
257 267
X2 =
0.087581
pVal =
0.76728
R中的等价物是:
chisq.test(X, Y, correct = FALSE)
注意:上述两种(MATLAB)方法都需要统计工具箱
答案 1 :(得分:0)
此函数将使用Pearson卡方统计量和似然比统计量以及计算残差来测试独立性。我知道这可以进一步矢量化,但我试图显示每一步的数学。
function independenceTest(data)
df = (size(data,1)-1)*(size(data,2)-1); % Mean Degrees of Freedom
sd = sqrt(2*df); % Standard Deviation
u = nan(size(data)); % Estimated expected frequencies
p = nan(size(data)); % Values used to calculate chi-square
lr = nan(size(data)); % Values used to calculate likelihood-ratio
residuals = nan(size(data)); % Residuals
rowTotals = sum(data,1);
colTotals = sum(data,2);
overallTotal = sum(rowTotals);
%% Calculate estimated expected frequencies
for r=1:1:size(data,1)
for c=1:1:size(data,2)
u(r,c) = (rowTotals(c) * colTotals(r)) / overallTotal;
end
end
%% Calculate chi-squared statistic
for r=1:1:size(data,1)
for c=1:1:size(data,2)
p(r,c) = (data(r,c) - u(r,c))^2 / u(r,c);
end
end
chi = sum(sum(p)); % Chi-square statistic
%% Calculate likelihood-ratio statistic
for r=1:1:size(data,1)
for c=1:1:size(data,2)
lr(r,c) = data(r,c) * log(data(r,c) / u(r,c));
end
end
G = 2 * sum(sum(lr)); % Likelihood-Ratio statisitc
%% Calculate residuals
for r=1:1:size(data,1)
for c=1:1:size(data,2)
numerator = data(r,c) - u(r,c);
denominator = sqrt(u(r,c) * (1 - colTotals(r)/overallTotal) * (1 - rowTotals(c)/overallTotal));
residuals(r,c) = numerator / denominator;
end
end