Question

我正在阅读Lourme A. et al (2016)的论文，我想从论文中提取90％的置信边界和10％的异常点，如图2所示：。

我无法使用LaTeX并插入带有置信区域定义的图片：

library("MASS")
library(copula)
set.seed(612)

n <- 1000 # length of sample
d <- 2    # dimension

# random vector with uniform margins on (0,1)
u1 <- runif(n, min = 0, max = 1)
u2 <- runif(n, min = 0, max = 1)

u = matrix(c(u1, u2), ncol=d)

Rg  <- cor(u)   # d-by-d correlation matrix
Rg1 <- ginv(Rg) # inv. matrix 

# round(Rg %*% Rg1, 8) # check

# the multivariate c.d.f of u is a Gaussian copula 
# with parameter Rg[1,2]=0.02876654

normal.cop = normalCopula(Rg[1,2], dim=d)
fit.cop    = fitCopula(normal.cop, u, method="itau") #fitting
# Rg.hat     = fit.cop@estimate[1]
# [1] 0.03097071
sim        = rCopula(n, normal.cop) # in (0,1)

# Taking the quantile function of N1(0, 1)

y1 <- qnorm(sim[,1], mean = 0, sd = 1)
y2 <- qnorm(sim[,2], mean = 0, sd = 1)

par(mfrow=c(2,2))

plot(y1, y2, col="red");  abline(v=mean(y1), h=mean(y2))
plot(sim[,1], sim[,2], col="blue")
hist(y1); hist(y2)

参考即可。 Lourme，A.，F。Maurer（2016）在风险管理框架中测试Gaussian和Student's t copulas。经济模型。

问题。有人可以帮助我并在等式中解释变量v=(v_1,...,v_d)和G(v_1),..., G(v_d)吗？

我认为v是非随机矩阵，维度应为{k} 2 $（网格点）d=2（维度）。例如，

axis_x <- seq(0, 1, 0.1) # 11 grid points
axis_y <- seq(0, 1, 0.1) # 11 grid points
v <- expand.grid(axis_x, axis_y)
plot(v,  type = "p")

Answer 1

所以，你的问题是关于向量nu和相应的G(nu)。

nu是一个简单的随机向量，它来自任何具有域（0,1）的分布。（这里我使用均匀分布）。由于您希望样本为2D，因此单个nu可以是nu = runif(2)。鉴于上述解释，G是一个高斯pdf，其均值为0，协方差矩阵为Rg。（Rg在2D中的尺寸为2x2）。

现在该段落的内容是：如果您有一个随机样本nu，并且您希望从Gamma得出它，并给出维度d和置信度alpha然后，您需要计算以下统计信息(G(nu) %*% Rg^-1) %*% G(nu)并检查其是否低于d和alpha的Chi ^ 2分布的pdf。

例如：

# This is the copula parameter
Rg <- matrix(c(1,runif(2),1), ncol = 2)
# But we need to compute the inverse for sampling
Rginv <- MASS::ginv(Rg)

sampleResult <- replicate(10000, {
  # we draw our nu from uniform, but others that map to (0,1), e.g. beta, are possible, too
  nu <- runif(2)
  # we compute G(nu) which is a gaussian cdf on the sample
  Gnu <- qnorm(nu, mean = 0, sd = 1)
  # for this we compute the statistic as given in formula
  stat <- (Gnu %*% Rginv) %*% Gnu
  # and return the result
  list(nu = nu, Gnu = Gnu, stat = stat)
})

theSamples <- sapply(sampleResult["nu",], identity)

# this is the critical value of the Chi^2 with alpha = 0.95 and df = number of dimensions
# old and buggy threshold <- pchisq(0.95, df = 2)
# new and awesome - we are looking for the statistic at alpha = .95 quantile
threshold <- qchisq(0.95, df = 2)
# we can accept samples given the threshold (like in equation)
inArea <- sapply(sampleResult["stat",], identity) < threshold

plot(t(theSamples), col = as.integer(inArea)+1)

红点是你要保留的点（我在这里绘制所有点）。

至于绘制决策边界，我认为它有点复杂，因为你需要计算nu的确切对，以便(Gnu %*% Rginv) %*% Gnu == pchisq(alpha, df = 2)。这是一个为Gnu解决的线性系统，然后应用反向以使nu处于决策边界。

编辑：再次阅读该段落，我注意到，Gnu的参数不会改变，只是Gnu <- qnorm(nu, mean = 0, sd = 1)。

编辑：存在一个错误：对于阈值，您需要使用分位数函数qchisq而不是分发函数pchisq - 现在已在上面的代码中进行了更正（和更新了数字）。

Answer 2

这有两个部分：首先，计算copula值作为X和Y的函数;然后，绘制曲线，给出copula超过阈值的边界。

计算该值基本上是@drey已经回答的线性代数。这是一个重写版本，因此copula由一个函数给出。

cop1 <- function(x)
{
    Gnu <- qnorm(x)
    Gnu %*% Rginv %*% Gnu
}

copula <- function(x)
{
    apply(x, 1, cop1)
}

绘制边界曲线可以使用与here相同的方法完成（后者又是教科书“现代应用统计与S”和“统计学习元素”使用的方法）。创建一个值网格，并使用插值找到给定高度的轮廓线。

Rg <- matrix(c(1,runif(2),1), ncol = 2)
Rginv <- MASS::ginv(Rg)

# draw the contour line where value == threshold
# define a grid of values first: avoid x and y = 0 and 1, where infinities exist
xlim <- 1e-3
delta <- 1e-3
xseq <- seq(xlim, 1-xlim, by=delta)
grid <- expand.grid(x=xseq, y=xseq)
prob.grid <- copula(grid)
threshold <- qchisq(0.95, df=2)

contour(x=xseq, y=xseq, z=matrix(prob.grid, nrow=length(xseq)), levels=threshold,
        col="grey", drawlabels=FALSE, lwd=2)

# add some points
data <- data.frame(x=runif(1000), y=runif(1000))
points(data, col=ifelse(copula(data) < threshold, "red", "black"))

如何在2D图上绘制$ \ alpha $置信区域？

2 个答案: