我从approxfun
包创建了一个Binsmooth
函数,用于从合并的数据中查找均值。
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000,
50000,60000,75000,100000,125000,150000,200000,NA)
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481,
79816,153581,195430,240948,155139,9452,92166,103217)
splb <- splinebins(binedges, bincounts, 76091)
键入splb$splineCDF(x)
将返回y,但是我想找到中间值。
我知道应该使用此功能来实现此目标,但是它似乎不适用于使用Binsmooth
包创建的功能。
get x-value given y-value: general root finding for linear / non-linear interpolation function
我已经汇总了一种简单的方法,可以找到一个近似值,但这并不是很令人满意并且需要大量计算机:
splb$splineCDF(50000)
fn(1000)
probability<- 0
income<- 0
while(probability< 0.5){
probability<- splb$splineCDF(income)
income<- income+ 10
}
有什么想法吗?
答案 0 :(得分:0)
我很想先尝试使用数值优化器为我找到中位数,看看它是否足够好。通过检查splb$splineCDF
与.5的接近程度,在这种情况下进行验证很容易。您可以添加一个测试,例如如果abs(splb$splineCDF(solution) - .5) > .001
,则停止脚本并调试。
解决方案使用来自optimize
基本R包中的stats
# manual step version
manual_version <- function(splb){
probability<- 0
income<- 0
while(probability< 0.5){
probability<- splb$splineCDF(income)
income<- income+ 10
}
return(income)
}
# try using a one dimensional optimiser - see ?optimize
optim_version <- function(splb, plot=TRUE){
# requires a continuous function to optimise, with the minimum at the median
objfun <- function(x){
(.5-splb$splineCDF(x))^2
}
# visualise the objective function
if(plot==TRUE){
x_range <- seq(min(binedges, na.rm=T), max(binedges, na.rm=T), length.out = 100)
z <- objfun(x_range)
plot(x_range, z, type="l", main="objective function to minimise")
}
# one dimensional optimisation to get point closest to .5 cdf
out <- optimize(f=objfun, interval = range(binedges, na.rm=TRUE))
return(out$minimum)
}
# test them out
v1 <- manual_version(splb)
v2 <- optim_version(splb, plot=TRUE)
splb$splineCDF(v1)
splb$splineCDF(v2)
# time them
library(microbenchmark)
microbenchmark("manual"={
manual_version(splb)
}, "optim"={
optim_version(splb, plot=FALSE)
}, times=50)