Question

我正在尝试对两类生物数据进行卡方检验。我有一个这样的数据框：

         Brain, Cerebelum, Heart, Kidney,  liver,  testis
expected 3        66       1        44       34       88
observed 6        57       4        45       35       69

structure(list(Brain = c(3L, 6L), Cerebelum = c(66L, 57L), heart = c(1L, 
4L), kidney = 44:45, liver = 34:35, testis = c(88L, 69L)), .Names = c("Brain", 
"Cerebelum", "heart", "kidney", "liver", "testis"), class = "data.frame", row.names = c("rand", 
"cns"))

我使用Python进行了测试：

from scipy.stats import chisquare
chisquare(obs,f_exp=exp)

将结果表示为：

Power_divergenceResult(statistic=17.381684491978611, pvalue=0.0038300192430189722)

我尝试使用R复制结果，因此我制作了csv文件，导入到R作为数据帧并运行代码：

d<-read.csv(file)
chisq.test(d)

Pearson's Chi-squared test

data:  d
X-squared = 4.9083, df = 5, p-value = 0.4272

为什么chi平方值和P值在python和R？中是不同的，正如我使用简单（OE）^ 2 / E公式手动计算的，由python计算的卡方值等于17.38但是我无法弄清楚R如何计算4.90的值。

Answer 1

我可以回答你的第一个问题。

chisq.test，当您为其提供带有> 2行和列的矩阵时，将其视为二维列联表，并测试沿行和列的观察值之间的独立性。 Here's an example和another one。

另一方面，

scipy.stats.chisq只是X = sum( (O_i-E_i)^2 / E_i)熟悉的R。

那么如何圈出圆圈？首先，传递p观察值，然后在参数e <- d[1, ] o <- d[2, ] chisq.test(o, p = e / sum(e), correct = FALSE)中定义预期概率。其次，您还需要阻止R进行默认的连续性校正。

Chi-squared test for given probabilities

data:  o
X-squared = 17.139, df = 5, p-value = 0.004243

瞧瞧

scipy

PS SO的棘手问题，可能更适合交叉验证？请注意，与?chisq.test相比，R的默认更正可能是件好事。这是否属实，绝对是交叉验证的。

<强> PPS If ‘x’ is a matrix with one row or column, or if ‘x’ is a vector and ‘y’ is not given, then a _goodness-of-fit test_ is performed (‘x’ is treated as a one-dimensional contingency table). The entries of ‘x’ must be non-negative integers. In this case, the hypothesis tested is whether the population probabilities equal those in ‘p’, or are all equal if ‘p’ is not given. If ‘x’ is a matrix with at least two rows and columns, it is taken as a two-dimensional contingency table: the entries of ‘x’ must be non-negative integers. Otherwise, ‘x’ and ‘y’ must be vectors or factors of the same length; cases with missing values are removed, the objects are coerced to factors, and the contingency table is computed from these. Then Pearson's chi-squared test is performed of the null hypothesis that the joint distribution of the cell counts in a 2-dimensional contingency table is the product of the row and column marginals.中的帮助是一个很难解析的问题，但我认为这是在某处;）

 correct: a logical indicating whether to apply continuity correction
          when computing the test statistic for 2 by 2 tables: one half
          is subtracted from all |O - E| differences; however, the
          correction will not be bigger than the differences
          themselves.  No correction is done if ‘simulate.p.value =
          TRUE’.

和

{{1}}

Python和R中卡方检验的不同P值

1 个答案: