这是我的查询,以查找薪水高于平均水平的员工。我使用子查询:
SELECT salary
FROM Employee
WHERE salary > (SELECT AVG(salary) FROM employee)
是否可以在不使用子查询和加入的情况下找到这些员工?
答案 0 :(得分:2)
如果只显示高于平均水平的最高工资(而不是所有工资高于平均水平)的结果是可以接受的,那么这可以在没有子选择的情况下完成:
select salary,
salary - avg(salary) over () as diff_to_average,
avg(salary) over () as average_salary
from employees
order by 2 desc
fetch first 1 row only;
(以上是标准的ANSI SQL)
缺点是您无法删除diff_to_average
列,因为您无法在同一级别的where子句中使用别名(您可以删除average_salaray
)。然而,整个问题确实没有意义。
一个不使用子选择但只使用派生表的解决方案是:
select *
from (
select salary, avg(salary) over () as average_salary
from employees
) t
where salary > average_salary
order by salary;
派生表只是必需的,因为SQL不允许(重新)在同一级别的WHERE子句中使用列别名。
但是,根据DBMS,您的问题中的查询可能更有效,因为派生表中的窗口函数通常需要某种缓冲,这在使用您的问题中的子选择时不会发生。
我创建了一个包含三列的表:id,name ans salary和一百万行,然后比较两个查询。我没有在薪水栏上创建索引。
使用window函数的查询缓冲结果以评估它:
Sort (cost=50423.64..51256.98 rows=333333 width=73) (actual time=598.267..608.075 rows=500409 loops=1)
Sort Key: t.salary
Sort Method: quicksort Memory: 82659kB
Buffers: shared hit=9346
-> Subquery Scan on t (cost=0.00..19846.00 rows=333333 width=73) (actual time=218.982..454.620 rows=500409 loops=1)
Filter: ((t.salary)::numeric > t.average_salary)
Rows Removed by Filter: 499591
Buffers: shared hit=9346
-> WindowAgg (cost=0.00..13846.00 rows=1000000 width=73) (actual time=218.978..336.965 rows=1000000 loops=1)
Buffers: shared hit=9346
-> Seq Scan on emp (cost=0.00..10346.00 rows=1000000 width=41) (actual time=0.022..55.422 rows=1000000 loops=1)
Buffers: shared hit=9346
Planning time: 0.099 ms
Execution time: 671.334 ms
使用子查询的问题解决方案效率更高,因为它不需要任何中间内存:
Seq Scan on emp (cost=12846.00..28192.00 rows=333333 width=41) (actual time=122.729..301.144 rows=500409 loops=1)
Filter: ((salary)::numeric > $0)
Rows Removed by Filter: 499591
Buffers: shared hit=18692
InitPlan 1 (returns $0)
-> Aggregate (cost=12846.00..12846.00 rows=1 width=32) (actual time=122.715..122.715 rows=1 loops=1)
Buffers: shared hit=9346
-> Seq Scan on emp emp_1 (cost=0.00..10346.00 rows=1000000 width=4) (actual time=0.004..54.477 rows=1000000 loops=1)
Buffers: shared hit=9346
Planning time: 0.062 ms
Execution time: 309.586 ms
Oracle执行计划看起来非常相似,Oracle也会在窗口函数的情况下缓冲结果:
SQL_ID 2x0xhkm1pkamz, child number 0
-------------------------------------
select * from ( select salary, avg(salary) over () as average_salary
from emp ) t where salary > average_salary order by salary
Plan hash value: 1471144246
-----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | | 6660 (100)| 500K|00:00:01.02 | 6679 | | | |
| 1 | SORT ORDER BY | | 1 | 655K| 16M| 22M| 6660 (1)| 500K|00:00:01.02 | 6679 | 17M| 1562K| 15M (0)|
|* 2 | VIEW | | 1 | 655K| 16M| | 1812 (1)| 500K|00:00:00.79 | 6679 | | | |
| 3 | WINDOW BUFFER | | 1 | 655K| 8325K| | 1812 (1)| 1000K|00:00:00.65 | 6679 | 34M| 2096K| 30M (0)|
| 4 | TABLE ACCESS FULL| EMP | 1 | 655K| 8325K| | 1812 (1)| 1000K|00:00:00.09 | 6679 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$2 / T@SEL$1
3 - SEL$2
4 - SEL$2 / EMP@SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("SALARY">"AVERAGE_SALARY")
与Postgres一样,使用子选择的查询在Oracle中也更有效:
SQL_ID 6fmzs2ru2cxa5, child number 1
-------------------------------------
select * from emp where salary > (select avg(salary) from emp)
Plan hash value: 1876299339
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1814 (100)| 500K|00:00:00.27 | 14347 |
|* 1 | TABLE ACCESS FULL | EMP | 1 | 500K| 37M| 2 (0)| 500K|00:00:00.27 | 14347 |
| 2 | SORT AGGREGATE | | 1 | 1 | 13 | | 1 |00:00:00.18 | 6679 |
| 3 | TABLE ACCESS FULL| EMP | 1 | 655K| 8325K| 1812 (1)| 1000K|00:00:00.09 | 6679 |
-----------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / EMP@SEL$1
2 - SEL$2
3 - SEL$2 / EMP@SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SALARY">)
因此,如果你的问题是:我正在寻找一个更有效的查询,那么答案是(至少对于上面的两个数据库):你的查询效率和它一样高。