表1
date_time | make | model | miles | reg_no | age_months
----------------------------------------------------------------------
2016-09-28 20:05:03.001 | toyota | prius | 10200 | 1111 | 22
2016-09-28 20:06:03.001 | suzuki | sx4 | 10300 | 1122 | 12
2016-09-28 20:09:03.001 | suzuki | sx4 | 11200 | 1133 | 34
2016-09-28 20:10:03.001 | toyota | prius | 15200 | 1144 | 28
2017-05-28 20:11:03.001 | toyota | prius | 15500 | 1144 | 36
对于上面表1中的数据,我希望通过模型(如均值,中位数,q1,q3,iqr等)对miles
每month
进行一些聚合。
我的查询如下,但它给出了错误:aggregate functions cannot be nested
- 正确的方法是什么?
select
model
, COUNT(DISTINCT reg_no) AS distinct_car_count
, COUNT(*) AS records_count
, ROUND(AVG(miles/age_months*1.0),2) AS miles_per_month_avg
, ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_med
, ROUND(PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q1
, ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q3
, miles_per_month_q3 - miles_per_month_q1 as miles_per_month_iqr
, sum(case when miles/age_months*1.0 < (miles_per_month_q1 - 1.5*miles_per_month_iqr) then 1 else 0 end) as miles_per_month_num_records_outliers_lower_bound
, sum(case when miles/age_months*1.0 > (miles_per_month_q3 + 1.5*miles_per_month_iqr) then 1 else 0 end) as miles_per_month_records_outliers_upper_bound
, ROUND(stddev_pop(miles/age_months*1.0),2) as miles_per_month_stddev
from table1 a
group by model;
答案 0 :(得分:2)
有两个问题:
#1:您无法嵌套聚合(如错误消息中明确指出的那样),miles_per_month_q1
是一个聚合列,并且您尝试在另一个聚合miles_per_month_num_records_outliers_lower_bound
中使用它。
#2:您尝试在miles_per_month_q1
的计算中重用列别名miles_per_month_iqr
,这在标准SQL中是不允许的。
对于这两种情况,您需要添加另一个嵌套级别(即派生表或公用表表达式),在您的情况下可能是:
SELECT
a.model
, Count(DISTINCT reg_no) AS distinct_car_count
, Count(*) AS records_count
, Round(Avg(miles/age_months*1.0),2) AS miles_per_month_avg
-- now you can use the aliases, but you have to add a dummy (it's always the same value for a given model) aggregation function like MIN or MAX
, Min(percentiles.miles_per_month_med)
, Min(percentiles.miles_per_month_q1)
, Min(percentiles.miles_per_month_q3)
, Min(percentiles.miles_per_month_q3 - percentiles.miles_per_month_q1) AS miles_per_month_iqr
-- now it's no more nested aggregation
, Sum(CASE WHEN miles/age_months*1.0 < (percentiles.miles_per_month_q1 - 1.5* (percentiles.miles_per_month_q3 - percentiles.miles_per_month_q1)) THEN 1 ELSE 0 end) AS miles_per_month_num_records_outliers_lower_bound
, Sum(CASE WHEN miles/age_months*1.0 > (percentiles.miles_per_month_q3 + 1.5* (percentiles.miles_per_month_q3 - percentiles.miles_per_month_q1)) THEN 1 ELSE 0 end) AS miles_per_month_records_outliers_upper_bound
, Round(StdDev_Pop(miles/age_months*1.0),2) AS miles_per_month_stddev
FROM table1 a
JOIN
( -- calculate the nested aggregates first
SELECT
model
, Round(Percentile_Cont(0.5) Within GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_med
, Round(Percentile_Cont(0.25) Within GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q1
, Round(Percentile_Cont(0.75) Within GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q3
FROM table1 a
GROUP BY model
) AS percentiles
ON a.model = percentiles.model
GROUP BY a.model
答案 1 :(得分:0)
这就是杀死你的原因:
,ROUND(PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q1
, ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q3
您在基于其他聚合函数构建的表达式(miles_per_month_q1,miles_per_month_q3)上使用 SUM - PERCENTILE_CONT
select
model
, COUNT(DISTINCT reg_no) AS distinct_car_count
, COUNT(*) AS records_count
, ROUND(AVG(miles/age_months*1.0),2) AS miles_per_month_avg
, ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_med
, min (miles_per_month_q1) as miles_per_month_q1
, min (miles_per_month_q3) as miles_per_month_q3
, miles_per_month_q3 - miles_per_month_q1 as miles_per_month_iqr
, sum(case when miles/age_months*1.0 < (miles_per_month_q1 - 1.5*(miles_per_month_q3 - miles_per_month_q1)) then 1 else 0 end) as miles_per_month_num_records_outliers_lower_bound
, sum(case when miles/age_months*1.0 > (miles_per_month_q3 + 1.5*(miles_per_month_q3 - miles_per_month_q1)) then 1 else 0 end) as miles_per_month_records_outliers_upper_bound
, ROUND(stddev_pop(miles/age_months*1.0),2) as miles_per_month_stddev
from (select a.*
, ROUND(PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) over (partition by model) AS miles_per_month_q1
, ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) over (partition by model) AS miles_per_month_q3
from table1 a
) a
group by model
;
将代码拆分为内部查询, PERCENTILE_CONT 由 SUM
的外部查询进行编写choice