我正在与SQR合作编写一份报告。我无法更改数据库的结构,也无法使用PL / SQL来完成此任务。
由于报告可以从远程位置运行,因此我不想在SQR中对数据库进行多次调用。我的目标是返回1个SQL中的所有内容,其中仅包含我需要报告的记录,以增加慢速连接的运行时间。
我现在正在使用它,但我关注数据库的性能。
“transactions”表包含以下可用于此目的的字段:
account_num number(10) -- the account number
seq_num number(10) -- not a real sequence, it is unique to account_num
check_num number(10) -- the number on the check
postdate date
主键是(account_num,seq_num)
示例数据如下所示:
account_num seq_num check_num postdate
----------- ---------- ---------- ----------
1 11 200 2014-07-13
1 16 201 2014-07-14
1 23 205 2014-07-15
2 52 282 2014-07-13
2 66 284 2014-07-14
2 72 231 2014-07-15
3 11 201 2014-07-13
3 12 202 2014-07-14
3 15 203 2014-07-15
注意:表中还有许多其他类型的事务,但我们正在过滤事务类型的列表,这对于这个问题并不是很重要,所以我把它排除在外。交易量似乎平均每月约750,000(对于所有交易,而不仅仅是支票),平均而言,大约有10,000个支票交易被报告。
选择标准是返回在两个日期之间发生的所有支票交易(包括 - 通常是该月的第一天和该月的最后一天),其中帐户的任何已排序的支票号之间的差异大于X(我们将在这种情况下使用10)。
使用上面的示例数据,结果如下所示:
account_num seq_num check_num postdate
----------- ---------- ---------- ----------
2 52 282 2014-07-13
2 66 284 2014-07-14
2 72 231 2014-07-15
返回来自account_num 2的所有支票,因为check_num 282和231之间的差异大于10.
我构建了以下SQL来返回上面的结果:
select
t1.*
from
transactions t1
join (
select
t3.account_num,
t3.min_postdate,
t3.max_postdate,
max(t3.check_diff)
from (
select distinct
t4.account_num,
lead(t4.check_num, 1, t4.check_num) over (partition by t4.account_num order by t4.check_num) - t4.check_num as check_diff,
min(t4.postdate) over (partition by t4.account_num) min_postdate,
max(t4.postdate) over (partition by t4.account_num) max_postdate
from
transactions t4
where
t4.postdate between trunc(sysdate,'mm') and last_day(trunc(sysdate))) t3
group by
t3.account_num,
t3.min_postdate,
t3.max_postdate
having max(t3.check_diff) > 10) t2
on t1.account_num = t2.account_num
and t1.postdate between t2.min_postdate and t2.max_postdate
;
我想从t4返回所有检查的seq_num,所以我最终在t1上使用主键。我尝试过使用LISTAGG,它可以将数字组合在一起。
listagg(t4.seq_num,',') within group (order by seq_num) over (partition by account_num) sqe_nums
但这就是我被困住的地方......使用逗号分隔的字符串。我可以使用INSTR使其工作,但它不能使用主键,性能很糟糕。
instr(t1.seq_num || ',', t2.seq_nbrs || ',') > 0
我尝试加入它:
join (
select
t2.account_num,
regexp_substr(t2.seq_nums,'[^,]+{1}',1,level) seq_num
from
dual
connect by
level <= length(regexp_replace(t2.seq_nums,'[^,]*')) + 1) t5
on t1.account_num = t5. accout_num
and t1.sqe_num = t5.seq_num
但我应该知道的更好(ORA-00904) - t2在连接选择中永远不可见。
有没有人有任何聪明的想法?
答案 0 :(得分:2)
我通过使用子查询和更多分析函数来完全避免连接:
select
account_num, seq_num, check_num, postdate
from
(
select account_num,
seq_num,
check_num,
postdate,
max(check_gap) over (partition by account_num) as max_check_gap
from
(
select account_num,
seq_num,
check_num,
postdate,
lead(check_num) over (partition by account_num order by check_num)
- check_num as check_gap
from
transactions
where postdate between trunc(sysdate,'mm') and last_day(trunc(sysdate))
)
)
where
max_check_gap > 10
order by account_num, check_num;
SQL Fiddle与原始查询,误读10检查间隙规则的中间尝试,以及此版本。所有这些数据都给出了相同的结果。
这并不能解决您提出的具体问题,但希望以不同的方式解决您的潜在性能问题。
如果你确实想要坚持使用连接 - 这会多次击中表格,效率会降低 - 你可以使用collect
。这是一种粗略的方式,可能会改进table
访问权限:
select
t1.*
from
transactions t1
join (
select
t3.account_num,
collect(t3.seq_num) as seq_nums,
t3.min_postdate,
t3.max_postdate,
max(t3.check_diff)
from (
select distinct
t4.account_num,
t4.seq_num,
lead(t4.check_num, 1, t4.check_num) over (partition by t4.account_num order by t4.check_num) - t4.check_num as check_diff,
min(t4.postdate) over (partition by t4.account_num) min_postdate,
max(t4.postdate) over (partition by t4.account_num) max_postdate
from
transactions t4
where
t4.postdate between trunc(sysdate,'mm') and last_day(trunc(sysdate))) t3
group by
t3.account_num,
t3.min_postdate,
t3.max_postdate
having max(t3.check_diff) > 10) t2
on t1.account_num = t2.account_num
and t1.seq_num in (select * from table(t2.seq_nums))
;