Question

在尝试将一些代码从Matlab移植到R时，我遇到了一个问题。代码的要点是产生2D核密度估计，然后使用估计进行一些简单的计算。在Matlab中，使用函数ksdensity2d.m完成KDE计算。在R中，KDE计算使用MASS包中的kde2d完成。所以我想说我想计算KDE并只是添加值（这不是我打算做的，但它可以达到这个目的）。在R中，这可以通过

来完成

    library(MASS)
    set.seed(1009)
    x <- sample(seq(1000, 2000), 100, replace=TRUE)
    y <- sample(seq(-12, 12), 100, replace=TRUE)
    kk <- kde2d(x, y, h=c(30, 1.5), n=100, lims=c(1000, 2000, -12, 12))
    sum(kk$z)

给出答案0.3932732。在Matlab中使用ksdensity2d时，使用相同的确切数据和条件，答案为0.3768。从查看kde2d的代码，我注意到带宽除以4

    kde2d <- function (x, y, h, n = 25, lims = c(range(x), range(y))) 
    {
    nx <- length(x)
    if (length(y) != nx) 
     stop("data vectors must be the same length")
    if (any(!is.finite(x)) || any(!is.finite(y))) 
     stop("missing or infinite values in the data are not allowed")
    if (any(!is.finite(lims))) 
     stop("only finite values are allowed in 'lims'")
    n <- rep(n, length.out = 2L)
    gx <- seq.int(lims[1L], lims[2L], length.out = n[1L])
    gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])
    h <- if (missing(h)) 
    c(bandwidth.nrd(x), bandwidth.nrd(y))
    else rep(h, length.out = 2L)
    if (any(h <= 0)) 
     stop("bandwidths must be strictly positive")
    h <- h/4
    ax <- outer(gx, x, "-")/h[1L]
    ay <- outer(gy, y, "-")/h[2L]
    z <- tcrossprod(matrix(dnorm(ax), , nx), matrix(dnorm(ay), 
     , nx))/(nx * h[1L] * h[2L])
    list(x = gx, y = gy, z = z)
    }

然后

进行简单检查以确定带宽差异是否是结果差异的原因

    kk <- kde2d(x, y, h=c(30, 1.5)*4, n=100, lims=c(1000, 2000, -12, 12))
    sum(kk$z)

给出0.3768013（与Matlab答案相同）。

那么我的问题是：为什么kde2d将带宽除以4？（或者为什么没有ksdensity2d？）

Answer 1

在镜像github source处，第31-35行：

<ons-toolbar fixed-style>

以及kde2d()的帮助文件，建议查看bandwidth的帮助文件。那说：

...它们都被缩放到密度的宽度参数，所以给出答案是答案的四倍。

但为什么？

density()表示存在if (any(h <= 0)) stop("bandwidths must be strictly positive") h <- h/4 # for S's bandwidth scale ax <- outer(gx, x, "-" )/h[1L] ay <- outer(gy, y, "-" )/h[2L]参数是为了与S（R的前身）兼容。 width density()中的评论为：## S has width equal to the length of the support of the kernel ## except for the gaussian where it is 4 * sd. ## R has bw a multiple of the sd.：

bw

默认值为高斯值。如果未指定width参数且width为library(MASS) set.seed(1) x <- rnorm(1000, 10, 2) all.equal(density(x, bw = 1), density(x, width = 4)) # Only the call is different，则替换为kde2d()，例如

kde2d()

然而，因为package main import "fmt" type Shaper interface { Area() float32 } type Square struct { side float32 } func (sq *Square) Area() float32 { return sq.side * sq.side } func main() { sq1 := new(Square) sq1.side = 5 // var areaIntf Shaper // areaIntf = sq1 // shorter, without separate declaration: // areaIntf := Shaper(sq1) // or even: areaIntf := sq1 fmt.Printf("The square has area: %f\n", areaIntf.Area()) }显然是为了与S保持兼容（并且我认为它最初写为FOR S，因为它在MASS中），所以最终除以4。在翻阅MASS的相关部分（约第126页）后，他们似乎已经选择了四个来平衡数据的平滑性和保真度之间的平衡。

总之，我的猜测是{{1}}除以4以保持与MASS的其余部分（以及最初为S编写的其他内容）保持一致，并且您对事物的处理方式看起来很好。< / p>

使用kde2d（R）和ksdensity2d（Matlab）生成的2D KDE的差异

1 个答案: