平均总和忽略最大值或最小值

时间:2017-10-03 19:02:44

标签: sql oracle

我想使用AVG来获取某些值的平均值,但只有当它们低于或高于第二个最大值和最小值时才忽略最大值和最小值。我会举几个例子:

示例1:

SELECT *
FROM (
SELECT 100.5 v FROM DUAL UNION
SELECT 101.5 v FROM DUAL UNION
SELECT 103.1 v FROM DUAL ) D

我需要这个结果,忽略103.1值:

100.5
101.5

示例2:

SELECT *
FROM (
SELECT 100.5 v FROM DUAL UNION
SELECT 101.5 v FROM DUAL UNION
SELECT 103.1 v FROM DUAL UNION
SELECT 106.2 v FROM DUAL) D

我需要这个结果,只忽略106.2值:

100.5
101.1
103.1

示例3:

SELECT *
FROM (
SELECT 100.0 v FROM DUAL UNION
SELECT 102.0 v FROM DUAL UNION
SELECT 103.0 v FROM DUAL UNION
SELECT 105.0 v FROM DUAL UNION
SELECT 107.0 v FROM DUAL) D

我需要这个结果,忽略100.0和107.0值:

102.0
103.0
105.0

当只有两个值时,它无关紧要。 如果结果正确,我可以正确AVG(值)。

3 个答案:

答案 0 :(得分:3)

您需要结合使用分析函数(超前/滞后)和条件聚合。这就是我想出的。请注意,我允许多个组,"调整"必须分别为每个组计算平均值(当你必须抛弃每个组中的异常值时,统计中的常见任务):

with
     inputs ( id, val ) as (
       select 101, 33   from dual union all
       select 102, 23   from dual union all
       select 102, 22.8 from dual union all
       select 103, 30   from dual union all
       select 103, 40   from dual union all
       select 104, 90   from dual union all
       select 104, 92   from dual union all
       select 104, 92   from dual union all
       select 104, 91.5 from dual union all
       select 104, 91.7 from dual
     )
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select    id, 
          avg ( case when cnt >= 3 
                        and ( lag_val is null and lead_val - val >= 1.5 
                              or
                              lead_val is null and val - lag_val >= 1.5
                            )
                     then null
                     else val
                end
              ) as adjusted_avg_val
from      (
            select id, val, count(val) over (partition by id) as cnt,
                   lag  ( val ) over ( partition by id order by val ) as lag_val,
                   lead ( val ) over ( partition by id order by val ) as lead_val
            from   inputs
          )
group by id
;

<强>输出

 ID  ADJUSTED_AVG_VAL
---  ----------------
101                33
102              22.9
103                35                      
104              91.8

答案 1 :(得分:1)

尝试使用以下row_number leadlag

的组合
with cte as (
SELECT 100.5 v FROM DUAL UNION ALL
SELECT 101.5 v FROM DUAL UNION ALL
SELECT 103.1 v FROM DUAL UNION ALL
SELECT 106.2 v FROM DUAL)
-- end of sample data
select avg(v)
from
(
  select row_number() over (order by v desc) arn, 
         row_number() over (order by v) drn, 
         lag(v) over (order by v) av,
         lead(v) over (order by v) dv,
         v
  from cte
) t
where (arn != 1 and drn != 1) or        -- if they are no maximum nor minumum
        (drn = 1 and v + 1.5 > dv) or   -- if they are minimum
        (arn = 1 and v - 1.5 < av) or   -- if they are maximum
        (av is null and arn < 3) or     -- if there are just two ore one value
        (dv is null and drn < 3)        -- if there are just two ore one value

答案 2 :(得分:0)

在SQL中,你只需要表达结果,所以......

WITH D as(
SELECT  100.0 v FROM DUAL UNION
SELECT 102.0 FROM DUAL UNION
SELECT 103.0 FROM DUAL UNION
SELECT 105.0 FROM DUAL UNION
SELECT 107.0 FROM DUAL)
SELECT  avg(v)
FROM D
where (v < (select max(v) from D )
       and ((select max(v) from D )
            -(select max(v) from D where v != 
              (select max(v) from D ) ) > 1.5))
or
(v > (select min(v) from D )
       and ((select min(v) from D )
            +(select min(v) from D where v != 
              (select min(v) from D ) ) > 1.5))

......应该做的伎俩!

但是提前考虑......以下版本也可能有用;)

  WITH D as(
    SELECT 1 PK, 100.0 v FROM DUAL UNION
    SELECT 1,102.0 FROM DUAL UNION
    SELECT 1,103.0 FROM DUAL UNION
    SELECT 1,105.0 FROM DUAL UNION
    SELECT 1,107.0 FROM DUAL)
    SELECT  PK,avg(v)
    FROM D
    where (v < (select max(v) from D group by PK)
           and ((select max(v) from D group by PK)
                -(select max(v) from D where v != 
                  (select max(v) from D group by PK) group by PK) > 1.5))
    or
    (v > (select min(v) from D group by PK)
           and ((select min(v) from D group by PK)
                +(select min(v) from D where v != 
                  (select min(v) from D group by PK) group by PK) > 1.5))

    GROUP BY PK

在现实生活中,你也会考虑大型数据集中的上述执行计划(作业)。

如有任何进一步的澄清,我可以通过评论随时为您服务。

此致 泰德