binsmooth包中的aproxfun函数,从y值中找到x

时间:2019-05-03 08:27:08

标签: r

我从approxfun包创建了一个Binsmooth函数,用于从合并的数据中查找均值。

binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000,
              50000,60000,75000,100000,125000,150000,200000,NA)
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481,
               79816,153581,195430,240948,155139,9452,92166,103217)
splb <- splinebins(binedges, bincounts, 76091)

键入splb$splineCDF(x)将返回y,但是我想找到中间值。

我知道应该使用此功能来实现此目标,但是它似乎不适用于使用Binsmooth包创建的功能。

get x-value given y-value: general root finding for linear / non-linear interpolation function

我已经汇总了一种简单的方法,可以找到一个近似值,但这并不是很令人满意并且需要大量计算机:


splb$splineCDF(50000)

fn(1000)

probability<- 0
income<- 0
while(probability< 0.5){
  probability<- splb$splineCDF(income)
  income<- income+ 10
}

有什么想法吗?

1 个答案:

答案 0 :(得分:0)

我很想先尝试使用数值优化器为我找到中位数,看看它是否足够好。通过检查splb$splineCDF与.5的接近程度,在这种情况下进行验证很容易。您可以添加一个测试,例如如果abs(splb$splineCDF(solution) - .5) > .001,则停止脚本并调试。

解决方案使用来自optimize基本R包中的stats

# manual step version
manual_version <- function(splb){
  probability<- 0
  income<- 0
  while(probability< 0.5){
    probability<- splb$splineCDF(income)
    income<- income+ 10
  }
  return(income)
}

# try using a one dimensional optimiser - see ?optimize
optim_version <- function(splb, plot=TRUE){
  # requires a continuous function to optimise, with the minimum at the median
  objfun <- function(x){
    (.5-splb$splineCDF(x))^2
  }

  # visualise the objective function
  if(plot==TRUE){
    x_range <- seq(min(binedges, na.rm=T), max(binedges, na.rm=T), length.out = 100)
    z <- objfun(x_range)
    plot(x_range, z, type="l", main="objective function to minimise")
  }

  # one dimensional optimisation to get point closest to .5 cdf
  out <- optimize(f=objfun, interval = range(binedges, na.rm=TRUE))

  return(out$minimum)
}

# test them out
v1 <- manual_version(splb)
v2 <- optim_version(splb, plot=TRUE)
splb$splineCDF(v1)
splb$splineCDF(v2)

# time them
library(microbenchmark)
microbenchmark("manual"={
  manual_version(splb)
}, "optim"={
  optim_version(splb, plot=FALSE)
}, times=50)