计算oracle中值的置信区间

时间:2016-01-10 12:13:09

标签: oracle median

我正在尝试生成一个oracle sql查询,它不仅会计算中位数年龄,还会计算周围95%的置信度。要添加并发症,需要在这种情况下跨组进行性别 我有一张年龄和性别的人。我想确定每组的中位年龄及其95%置信区间。 我目前失败的尝试如下:

select gender,
       median(age),
       count(*),
       percentile_cont(ROUND((COUNT(*)/2)-1.96*sqrt(COUNT(*))/2)/COUNT(*)) 
         within GROUP (ORDER BY age) lowmedianage,
       percentile_cont(ROUND((COUNT(*)/2)+1.96*sqrt(COUNT(*))/2)/COUNT(*)) 
         within GROUP (ORDER BY age) highmedianage
  from persontable
  group by gender

我收到的错误不是GROUP BY表达式。

2 个答案:

答案 0 :(得分:1)

这里的问题是你有函数count作为percentile_cont的参数,它需要一个常量,这个常量必须是group by子句的一部分。你可以在这里使用子查询。类似的东西:

select gender, median(age), count(*),
       percentile_cont(low) within GROUP (ORDER BY age) lowmedianage,
       percentile_cont(high) within GROUP (ORDER BY age) highmedianage
  from (select age, gender,
               ROUND((COUNT(*)/2)-1.96*sqrt(COUNT(*))/2)/COUNT(*) low,
               ROUND((COUNT(*)/2)+1.96*sqrt(COUNT(*))/2)/COUNT(*) high
          from persontable
         group by age, gender)
 group by gender, low, high

答案 1 :(得分:1)

使用this book中的公式我结束以下查询(我不确定你是否处理好低和高范围;我的解释是你计算一系列序列号,你必须看从那些位置上升值。)

with tab as 
-- add sequence per group
(
select gender, age,
row_number() over (PARTITION  BY gender order by  gender, age) as seq
from persontable
),
-- get count
N as (select gender, count(*) cnt from persontable group by gender),
-- calculate sequence numbers of the CI
ci_seq as (
select gender,
round(cnt/2 - (1.96 * sqrt(cnt)/2)) r,
round(1 + cnt/2 + (1.96 * sqrt(cnt)/2)) s
from n),
-- calculate median
med as (
select 
  gender,
  median(age) median_age
from  persontable
group by gender),
med2 as (
select med.gender, median_age, r, s
from med 
join ci_seq on med.gender = ci_seq.gender 
)
select gender, median_age,
(select age from tab where seq = r and gender = med2.gender) ci_from,
(select age from tab where seq = s and gender = med2.gender) ci_to
from med2
;

另请注意,公式仅近似于CI。您也可以检查this thread进行替代计算。