我正在尝试生成一个oracle sql查询,它不仅会计算中位数年龄,还会计算周围95%的置信度。要添加并发症,需要在这种情况下跨组进行性别 我有一张年龄和性别的人。我想确定每组的中位年龄及其95%置信区间。 我目前失败的尝试如下:
select gender,
median(age),
count(*),
percentile_cont(ROUND((COUNT(*)/2)-1.96*sqrt(COUNT(*))/2)/COUNT(*))
within GROUP (ORDER BY age) lowmedianage,
percentile_cont(ROUND((COUNT(*)/2)+1.96*sqrt(COUNT(*))/2)/COUNT(*))
within GROUP (ORDER BY age) highmedianage
from persontable
group by gender
我收到的错误不是GROUP BY表达式。
答案 0 :(得分:1)
这里的问题是你有函数count作为percentile_cont的参数,它需要一个常量,这个常量必须是group by子句的一部分。你可以在这里使用子查询。类似的东西:
select gender, median(age), count(*),
percentile_cont(low) within GROUP (ORDER BY age) lowmedianage,
percentile_cont(high) within GROUP (ORDER BY age) highmedianage
from (select age, gender,
ROUND((COUNT(*)/2)-1.96*sqrt(COUNT(*))/2)/COUNT(*) low,
ROUND((COUNT(*)/2)+1.96*sqrt(COUNT(*))/2)/COUNT(*) high
from persontable
group by age, gender)
group by gender, low, high
答案 1 :(得分:1)
使用this book中的公式我结束以下查询(我不确定你是否处理好低和高范围;我的解释是你计算一系列序列号,你必须看从那些位置上升值。)
with tab as
-- add sequence per group
(
select gender, age,
row_number() over (PARTITION BY gender order by gender, age) as seq
from persontable
),
-- get count
N as (select gender, count(*) cnt from persontable group by gender),
-- calculate sequence numbers of the CI
ci_seq as (
select gender,
round(cnt/2 - (1.96 * sqrt(cnt)/2)) r,
round(1 + cnt/2 + (1.96 * sqrt(cnt)/2)) s
from n),
-- calculate median
med as (
select
gender,
median(age) median_age
from persontable
group by gender),
med2 as (
select med.gender, median_age, r, s
from med
join ci_seq on med.gender = ci_seq.gender
)
select gender, median_age,
(select age from tab where seq = r and gender = med2.gender) ci_from,
(select age from tab where seq = s and gender = med2.gender) ci_to
from med2
;
另请注意,公式仅近似于CI。您也可以检查this thread进行替代计算。