Question

背景

通常，R给出众所周知的分布的分位数。在这些分位数中，较低的2.5％直至97.5％的上限覆盖了这些分布下95％的面积。

问题：

假设我有一个F分布（df1 = 10，df2 = 90）。在R中，如何确定此分布下95％的面积，使95％仅覆盖HIGH DENSITY区域，而不是R通常给出的95％（参见我的R代码）？

注意：显然，最高密度是＆＃34;模式＆＃34; （下图中的虚线）。所以我想，必须从'＃34;模式＆＃34;朝着尾巴。

这是我的R代码：

  // gets all the reviews about an instructor
  app.get('/findInstructorReviews/:instructorID', function(req, res) {

    var commentCollection = req.db.get('comments');

    commentCollection.findOne(
      { _id: ObjectID(req.params.instructorID) },
      {
        $orderby: { 'comments.startDateTime' : -1 } 
      },
    function(err, instructorReviews){debug(err);

      if(err) return res.send({ status: 500, data: 'There was an error retreiving the instructor\'s reviews. Please try again later.' });
      if(!instructorReviews) return res.send({ status: 404, data: 'This instructor does not have any review yet.' });

      return res.send({ status: 200, data: instructorReviews.comments });
    });
  });

Answer 1

DBDA2E的第25.2节给出了完整的R代码，用于确定三种方式指定的分布的最高密度区间：累积密度函数，网格近似或样本。对于累积密度函数，该函数称为HDIofICDF()。它位于本书网站（上面链接）的实用程序脚本DBDA2E-utilities.R中。这是代码：

HDIofICDF = function( ICDFname , credMass=0.95 , tol=1e-8 , ... ) {
  # Arguments:
  # ICDFname is R’s name for the inverse cumulative density function
  # of the distribution.
  # credMass is the desired mass of the HDI region.
  # tol is passed to R’s optimize function.
  # Return value:
  # Highest density interval (HDI) limits in a vector.
  # Example of use: For determining HDI of a beta(30,12) distribution, type
  # > HDIofICDF( qbeta , shape1 = 30 , shape2 = 12 )
  # Notice that the parameters of the ICDFname must be explicitly named;
  # e.g., HDIofICDF( qbeta , 30 , 12 ) does not work.
  # Adapted and corrected from Greg Snow’s TeachingDemos package.
  incredMass = 1.0 - credMass
  intervalWidth = function( lowTailPr , ICDFname , credMass , ... ) {
    ICDFname( credMass + lowTailPr , ... ) - ICDFname( lowTailPr , ... )
  }
  optInfo = optimize( intervalWidth , c( 0 , incredMass ) , ICDFname=ICDFname ,
    credMass=credMass , tol=tol , ... )
  HDIlowTailPr = optInfo$minimum
  return( c( ICDFname( HDIlowTailPr , ... ) ,
    ICDFname( credMass + HDIlowTailPr , ... ) ) )
}

Answer 2

使用`HDR.f`包中的`stat.extend`函数

stat.extend package为R中的所有基本发行版及其扩展包中的某些发行版提供HDR功能。它使用基于分位数函数的分布方法，并自动调整分布形状（单峰，双峰等）。这是使用该功能计算您感兴趣的HDR的方法。

#Load library
library(stat.extend)

#Compute HDR for an F distribution
HDR.f(cover.prob = 0.9, df1 = 10, df2 = 20)

        Highest Density Region (HDR) 
 
90.00% HDR for F distribution with 10 numerator degrees-of-freedom and 
20 denominator degrees-of-freedom 
Computed using nlm optimisation with 9 iterations (code = 3) 

[0.220947190373167, 1.99228812929142]

Answer 3

您是否尝试过以下软件包：https://github.com/robjhyndman/hdrcde？

以下为例：

library(hdrcde)
hdr.den(rf(1000,10,90),prob=95)

您可以使用各种高密度区域，并且适用于多峰pdf。

hdr.den（c（rf（1000,10,90），rnorm（1000,4,1）），prob = c（50,75,95））

然后

您甚至可以将其与多元分布一起用于可视的2D高密度区域：

hdrs=c(50,75,95)
x=c(rf(1000,10,90),rnorm(1000,4,1))
y=c(rf(1000,5,50),rnorm(1000,7,1) )
par(mfrow=c(1,3))
hdr.den(x,prob=hdrs,xlab="x")
hdr.den(y,prob=hdrs,xlab="y")
hdr.boxplot.2d(x,y,prob=hdrs,shadecol="red",xlab="x",ylab="y")

确定R中分布的高密度区域

3 个答案:

使用`HDR.f`包中的`stat.extend`函数

确定R中分布的高密度区域

3 个答案:

使用HDR.f包中的stat.extend函数

使用`HDR.f`包中的`stat.extend`函数