我需要根据号码范围提取基于LC电话号码的报告。电话号码格式如下图所示,我需要在标点符号前提取第二个字段进行分组:
CALL_NO_ID1
--------------
a!3243 .m43 12
a#435 234 1999
cs"345 1973.
...
关注是我的sql
select count("CALL_NO_ID1") "No_of_Items",
case
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KG %') THEN 'KG 0-999 - Federal law Common and collective state law Individual states US - Latin AmericaGeneral'
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KH %') THEN 'KH 0-999 - Federal law Common and collective state law Individual states US - South AmericaGeneral '
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 1 AND 100)AND ("CALL_NO_DESC1" LIKE 'DE %') THEN 'DE 1-100 - HistoryGeneral - The Mediterranean Region The Greco-Roman World'
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 1 AND 1050)AND ("CALL_NO_DESC1" LIKE 'TR %') THEN 'TR 1-1050 - Photography'
...
... (around 450 case conditions)
...
else "CALL_NO_ID1"
end "Primary Call"
from DWH_FACT_ITEMS
group by
case
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KG %') THEN 'KG 0-999 - Federal law Common and collective state law Individual states US - Latin AmericaGeneral'
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KH %') THEN 'KH 0-999 - Federal law Common and collective state law Individual states US - South AmericaGeneral '
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 1 AND 100)AND ("CALL_NO_DESC1" LIKE 'DE %') THEN 'DE 1-100 - HistoryGeneral - The Mediterranean Region The Greco-Roman World'
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 1 AND 1050)AND ("CALL_NO_DESC1" LIKE 'TR %') THEN 'TR 1-1050 - Photography'
...
... (around 450 case conditions)
...
然而,获得结果需要很长时间(2~3小时),我想知道改进我的SQL的任何建议吗?
谢谢!
Moris
答案 0 :(得分:0)
我会添加一个额外的列CALL_NO_CLEARED
来保留该号码。将表达式应用于所有值以填充人工列。
您可以添加触发器ON INSERT/UPDATE
,以便在添加或更改时立即填充列。
然后,您可以使用选择引入索引中的CALL_NO_CLEARED
来加快速度。
更新:
我可以建议另一种方式。似乎最耗时的过程就是调用
LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0')
因此,对于每一行,我们计算450次(对于每个提到的WHEN
尝试将计算放在子查询中,然后稍后应用该组,例如
select *
FROM (
select
CALL_NO_ID1,
LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') as sub_num
from DWH_FACT_ITEMS) sub
group by
case
WHEN (sub.sub_num BETWEEN 0 AND 999)AND (sub.CALL_NO_DESC1 LIKE 'KG %') THEN 'KG 0-999 - Federal law Common and collective state law Individual states US - Latin AmericaGeneral'
...
答案 1 :(得分:0)
您有多种方法可以改进查询。我的测试表明,那个 通过(使用子查询)消除组中的案例可以提高查询的可维护性和大小,但保持性能不变。
通过对案例陈述进行排序,以及最常见的条件放在顶部,观察到了特别的改进。
这个想法很简单,如果在CASE的早期完成匹配,则会跳过其余的条件。
重新排序WHEN语句中的谓词,实现了更好的改进。如果CALL_NO_DESC子字符串不匹配,则不会调用正则表达式处理。
WHEN ("CALL_NO_DESC1" LIKE 'TR %') and
(LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 1 AND 1050)
THEN 'TR 1-4050 - Photography'
最后一步是在子查询中只调用一次REGEXP处理。
总而言之,我以这个查询结束,大大减少了经过的时间(使用我的测试数据)。
with dta as (
select "CALL_NO_ID1",
LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') parsed_num,
"CALL_NO_DESC1"
from DWH_FACT_ITEMS
),
dta2 as (
select
CALL_NO_ID1, CALL_NO_DESC1,
case
-- put the most frequent condition on the top
-- start with the most selective predicate
WHEN ("CALL_NO_DESC1" LIKE 'TR %') and parsed_num BETWEEN 1 AND 1050 THEN 'TR 1-4050 - Photography'
--....
else "CALL_NO_ID1"
end "Primary Call"
from dta
)
select count("CALL_NO_ID1") "No_of_Items","Primary Call"
from dta2
group by "Primary Call"
;
答案 2 :(得分:0)
使用WITH
子句和/*+ MATERIALIZE */
提示让Oracle只执行一次昂贵的操作。
对于400,000行,这应该比2-3小时好得多:
WITH parsed_call_numbers as ( SELECT /*+ MATERIALIZE */
SELECT CALL_NO_ID1,
(LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') call_Number_part,
CALL_NO_DESC1
from DWH_FACT_ITEMS ) ,
primary_calls AS ( SELECT /*+ MATERIALIZE */
CALL_NO_ID1,
case
WHEN call_number_part BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KG %') THEN 'KG 0-999 - Federal law Common and collective state law Individual states US - Latin AmericaGeneral'
WHEN call_number_part BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KH %') THEN 'KH 0-999 - Federal law Common and collective state law Individual states US - South AmericaGeneral '
WHEN call_number_part BETWEEN 1 AND 100)AND ("CALL_NO_DESC1" LIKE 'DE %') THEN 'DE 1-100 - HistoryGeneral - The Mediterranean Region The Greco-Roman World'
WHEN call_number_part BETWEEN 1 AND 1050)AND ("CALL_NO_DESC1" LIKE 'TR %') THEN 'TR 1-1050 - Photography'
--...
--... (around 450 case conditions)
--...
else "CALL_NO_ID1"
end "Primary Call"
from parsed_call_numbers )
select count("CALL_NO_ID1") "No_of_Items", "Primary Call"
FROM primary_calls
group by "Primary Call"