我有好奇心。我必须创建一个选择来从包含数百万个数据的表中检索数据并添加一些过滤器。我只需要在start_date和current_date之间的特定时间段内获取数据。 我需要这些数据来计算一些要求。
首先,我尝试使用with子句来过滤数据并仅保留起始日期和当前日期之间的值,但是运行需要很长时间。 (可以说30秒只是为了比较一些东西)。
之后我创建了一个表,并使用相同的长时间运行select插入其中,插入在2秒内运行。
有人可以解释我为什么吗?我想学习如何优化我的查询,我需要理解这一点。
我无法发布实际查询,但我可以草拟一些内容。
SELECT
employee_id,
access_date,
access_description,
day,
org_type,
access_day
FROM
(
SELECT
employee_id,
access_date,
access_description,
day,
org_type,
trunc(access_date) as access_day,
row_number() over partition by (employee_id, access_date order by access_date) as dedup
FROM employees epl
WHERE access_date >=to_date('01-JAN-2018','DD-MON-YYYY')
AND access_date <=to_date('31-JAN-2018','DD-MON-YYYY')
AND access_description='A1' AND day='monday'
AND UPPER(org_type) <>'D' AND NOT EXISTS ( SELECT null FROM emp e WHERE epl.employee_id=e.employee_id
AND e.access_type='a')
AND NOT EXISTS (SELECT null FROM tmp t where epl.employee_id =t.employee_id) )
where dedup=1;
我有一个关于employee_id的索引。如果我运行它需要大约30秒。
如果我创建一个表employee_filter:
insert into employee_filter
SELECT
employee_id,
access_date,
access_description,
day,
org_type,
access_day
FROM (
SELECT
employee_id,
access_date,
access_description,
day,
org_type,
trunc(access_date) as access_day,
row_number() over partition by (employee_id, access_date order by access_date) as dedup
FROM employees epl
WHERE access_date >=to_date('01-JAN-2018','DD-MON-YYYY')
AND access_date <=to_date('31-JAN-2018','DD-MON-YYYY')
AND access_description='A1'
AND day='monday' AND UPPER(org_type) <>'D'
AND NOT EXISTS ( SELECT null FROM emp e WHERE epl.employee_id=e.employee_id AND e.access_type='a')
AND NOT EXISTS (SELECT null FROM tmp t where epl.employee_id =t.employee_id)
)
where dedup=1;
它在2秒内运行。
如果我在没有row_number函数的情况下运行select,它也会在几秒~2秒内运行。但是我需要这个功能来进行重复数据删除。
SELECT
employee_id,
access_date,
access_description,
day,
org_type,
trunc(access_date) as access_day FROM employees epl
WHERE access_date >=to_date('01-JAN-2018','DD-MON-YYYY')
AND access_date <=to_date('31-JAN-2018','DD-MON-YYYY')
AND access_description='A1'
AND day='monday'
AND UPPER(org_type) <>'D'
AND NOT EXISTS ( SELECT null FROM emp e WHERE epl.employee_id=e.employee_id AND e.access_type='a')
AND NOT EXISTS (SELECT null FROM tmp t where epl.employee_id =t.employee_id) epl.employee_id=e.employee_id AND e.access_type='a')
AND NOT EXISTS (SELECT null FROM tmp t where epl.employee_id =t.employee_id);