从巴特利特对R中方差同质性的检验中得出回归方差

时间:2016-06-30 21:51:56

标签: r variance

Bartlett的测试允许您测试不同组中的方差是否相同。

R中的stats包具有函数bartlett.test。这是使用R。

中提供的数据集的示例
bt <- bartlett.test(count ~ spray, data = InsectSprays)

如何从bartlett.test获得实际差异?

我似乎无法在对象bt

中找到它
names(bt)
[1] "data.name" "method"    "p.value"   "parameter" "statistic"

您可以使用var()自行计算方差。一种方法是使用summaryBy

library(doBy)
summaryBy(count~spray, data=InsectSprays, FUN=var)

但是,您希望bartlett.test提供每组的差异。同样,计算R中的t.test也可以得出每组的平均值。那么,我们可以从R中的bartlett.test中提取每个组的方差,以及如何?

2 个答案:

答案 0 :(得分:2)

不,不幸的是你不能。

方差不会出现在返回对象的结构中。

我阅读了该函数的来源,您可以通过重写一点来提取它,但这比定制解决方案要多得多。

你可以在这里看到我的意思:

#  File src/library/stats/R/bartlett.test.R
#  Part of the R package, https://www.R-project.org
#
#  Copyright (C) 1995-2015 The R Core Team
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  A copy of the GNU General Public License is available at
#  https://www.R-project.org/Licenses/

bartlett.test <- function(x, ...) UseMethod("bartlett.test")

bartlett.test.default <-
function(x, g, ...)
{
    LM <- FALSE
    if (is.list(x)) {
        if (length(x) < 2L)
            stop("'x' must be a list with at least 2 elements")
        DNAME <- deparse(substitute(x))
        if (all(sapply(x, function(obj) inherits(obj, "lm"))))
            LM <- TRUE
        else
            x <- lapply(x, function(x) x <- x[is.finite(x)])
        k <- length(x)
    }
    else {
        if (length(x) != length(g))
            stop("'x' and 'g' must have the same length")
        DNAME <- paste(deparse(substitute(x)), "and",
                       deparse(substitute(g)))
        OK <- complete.cases(x, g)
        x <- x[OK]
        g <- factor(g[OK])
        k <- nlevels(g)
        if (k < 2)
            stop("all observations are in the same group")
        x <- split(x, g)
    }

    if (LM) {
        n <- sapply(x, function(obj) obj$df.resid)
        v <- sapply(x, function(obj) sum(obj$residuals^2))
    } else {
        n <- sapply(x, "length") - 1
        if (any(n <= 0))
            stop("there must be at least 2 observations in each group")
        v <- sapply(x, "var")
    }

    n.total <- sum(n)
    v.total <- sum(n * v) / n.total
    STATISTIC <- ((n.total * log(v.total) - sum(n * log(v))) /
                  (1 + (sum(1 / n) - 1 / n.total) / (3 * (k - 1))))
    PARAMETER <- k - 1
    PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
    names(STATISTIC) <- "Bartlett's K-squared"
    names(PARAMETER) <- "df"

    RVAL <- list(statistic = STATISTIC,
                 parameter = PARAMETER,
                 p.value = PVAL,
                 data.name = DNAME,
                 method = "Bartlett test of homogeneity of variances")
    class(RVAL) <- "htest"
    return(RVAL)
}

bartlett.test.formula <-
function(formula, data, subset, na.action, ...)
{
    if(missing(formula) || (length(formula) != 3L))
        stop("'formula' missing or incorrect")
    m <- match.call(expand.dots = FALSE)
    if(is.matrix(eval(m$data, parent.frame())))
        m$data <- as.data.frame(data)
    ## need stats:: for non-standard evaluation
    m[[1L]] <- quote(stats::model.frame)
    mf <- eval(m, parent.frame())
    if(length(mf) != 2L)
        stop("'formula' should be of the form response ~ group")
    DNAME <- paste(names(mf), collapse = " by ")
    names(mf) <- NULL
    y <- do.call("bartlett.test", as.list(mf))
    y$data.name <- DNAME
    y
}

答案 1 :(得分:1)

这比稍微比Hack-R暗示的更容易被攻击,但他们是正确的,像sapply(split(InsectSprays,spray),function(x) var(x$count))这样的东西(在基础R中完成所有操作)可能更容易。< / p>

此处显示的技术是脆弱的,因为它依赖于内置函数的确切形式;如果在R的未来版本中对函数进行了轻微更改,它将停止工作。更安全dump()整个函数并根据自己的喜好进行修改,然后source()结果。

bb <- stats:::bartlett.test.default
bb2 <- body(bb)
## add a line to save the variances
bb2[[12]] <- quote(ESTIMATE <- v)
## add the variances to the return list
bb2[[13]] <- quote(RVAL <- list(statistic = STATISTIC, parameter = PARAMETER, estimate = ESTIMATE, p.value = PVAL, data.name = DNAME, method = "Bartlett test of homogeneity of variances"))
## restore the rest of the function
bb2[14:15] <- body(bb)[13:14]
body(bb) <- bb2

现在将其放回stats命名空间:

assignInNamespace("bartlett.test.default",bb,pos="package:stats")

测试:

(bt <- bartlett.test(count~spray,data=InsectSprays))
##  Bartlett test of homogeneity of variances
## 
## data:  count by spray
## Bartlett's K-squared = 25.96, df = 5, p-value = 9.085e-05
## sample estimates:
##         A         B         C         D         E         F 
## 22.272727 18.242424  3.901515  6.265152  3.000000 38.606061 

您可以通过bt$estimate检索值。

作为对r-devel的增强,可能值得在bartlett.test上建议:我能想到的唯一反驳论点是,如果有许多群体正在接受测试,那么输出会很笨拙。 / p>