我想使用AVG来获取某些值的平均值,但只有当它们低于或高于第二个最大值和最小值时才忽略最大值和最小值。我会举几个例子:
示例1:
SELECT *
FROM (
SELECT 100.5 v FROM DUAL UNION
SELECT 101.5 v FROM DUAL UNION
SELECT 103.1 v FROM DUAL ) D
我需要这个结果,忽略103.1值:
100.5
101.5
示例2:
SELECT *
FROM (
SELECT 100.5 v FROM DUAL UNION
SELECT 101.5 v FROM DUAL UNION
SELECT 103.1 v FROM DUAL UNION
SELECT 106.2 v FROM DUAL) D
我需要这个结果,只忽略106.2值:
100.5
101.1
103.1
示例3:
SELECT *
FROM (
SELECT 100.0 v FROM DUAL UNION
SELECT 102.0 v FROM DUAL UNION
SELECT 103.0 v FROM DUAL UNION
SELECT 105.0 v FROM DUAL UNION
SELECT 107.0 v FROM DUAL) D
我需要这个结果,忽略100.0和107.0值:
102.0
103.0
105.0
当只有两个值时,它无关紧要。 如果结果正确,我可以正确AVG(值)。
答案 0 :(得分:3)
您需要结合使用分析函数(超前/滞后)和条件聚合。这就是我想出的。请注意,我允许多个组,"调整"必须分别为每个组计算平均值(当你必须抛弃每个组中的异常值时,统计中的常见任务):
with
inputs ( id, val ) as (
select 101, 33 from dual union all
select 102, 23 from dual union all
select 102, 22.8 from dual union all
select 103, 30 from dual union all
select 103, 40 from dual union all
select 104, 90 from dual union all
select 104, 92 from dual union all
select 104, 92 from dual union all
select 104, 91.5 from dual union all
select 104, 91.7 from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select id,
avg ( case when cnt >= 3
and ( lag_val is null and lead_val - val >= 1.5
or
lead_val is null and val - lag_val >= 1.5
)
then null
else val
end
) as adjusted_avg_val
from (
select id, val, count(val) over (partition by id) as cnt,
lag ( val ) over ( partition by id order by val ) as lag_val,
lead ( val ) over ( partition by id order by val ) as lead_val
from inputs
)
group by id
;
<强>输出强>:
ID ADJUSTED_AVG_VAL
--- ----------------
101 33
102 22.9
103 35
104 91.8
答案 1 :(得分:1)
尝试使用以下row_number
lead
和lag
with cte as (
SELECT 100.5 v FROM DUAL UNION ALL
SELECT 101.5 v FROM DUAL UNION ALL
SELECT 103.1 v FROM DUAL UNION ALL
SELECT 106.2 v FROM DUAL)
-- end of sample data
select avg(v)
from
(
select row_number() over (order by v desc) arn,
row_number() over (order by v) drn,
lag(v) over (order by v) av,
lead(v) over (order by v) dv,
v
from cte
) t
where (arn != 1 and drn != 1) or -- if they are no maximum nor minumum
(drn = 1 and v + 1.5 > dv) or -- if they are minimum
(arn = 1 and v - 1.5 < av) or -- if they are maximum
(av is null and arn < 3) or -- if there are just two ore one value
(dv is null and drn < 3) -- if there are just two ore one value
答案 2 :(得分:0)
在SQL中,你只需要表达结果,所以......
WITH D as(
SELECT 100.0 v FROM DUAL UNION
SELECT 102.0 FROM DUAL UNION
SELECT 103.0 FROM DUAL UNION
SELECT 105.0 FROM DUAL UNION
SELECT 107.0 FROM DUAL)
SELECT avg(v)
FROM D
where (v < (select max(v) from D )
and ((select max(v) from D )
-(select max(v) from D where v !=
(select max(v) from D ) ) > 1.5))
or
(v > (select min(v) from D )
and ((select min(v) from D )
+(select min(v) from D where v !=
(select min(v) from D ) ) > 1.5))
......应该做的伎俩!
但是提前考虑......以下版本也可能有用;)
WITH D as(
SELECT 1 PK, 100.0 v FROM DUAL UNION
SELECT 1,102.0 FROM DUAL UNION
SELECT 1,103.0 FROM DUAL UNION
SELECT 1,105.0 FROM DUAL UNION
SELECT 1,107.0 FROM DUAL)
SELECT PK,avg(v)
FROM D
where (v < (select max(v) from D group by PK)
and ((select max(v) from D group by PK)
-(select max(v) from D where v !=
(select max(v) from D group by PK) group by PK) > 1.5))
or
(v > (select min(v) from D group by PK)
and ((select min(v) from D group by PK)
+(select min(v) from D where v !=
(select min(v) from D group by PK) group by PK) > 1.5))
GROUP BY PK
在现实生活中,你也会考虑大型数据集中的上述执行计划(作业)。
如有任何进一步的澄清,我可以通过评论随时为您服务。
此致 泰德