查询生产所需的时间

时间:2019-05-20 10:56:32

标签: sql hive query-optimization hiveql

我们有此查询,我们试图在其中识别具有多个信用选项指标的客户。我们必须将此查询输出反映在我们的报告中并分享给业务用户。我们几乎每周都要运行此查询,并且此查询需要时间。

查询

select CUST_ID, CUST_COUNT from (
SELECT N.CONS_ID AS CUST_ID,
COUNT(DISTINCT(case when M.CO_ID is null then 1 else m.co_id end)) AS CUST_COUNT
FROM CTS_VIEW.CNSLD_CREDIT_SUM M
INNER JOIN  CTS_VIEW.LEGACY_CODE_XREF  N
ON M.EE_ID = N.EE_GBL_IND
WHERE M.PROD_DT >= DATE '2018-12-31'
GROUP BY N.CONS_ID
  ) a
where CUST_COUNT>1;

有没有更好的方法可以编写此查询,从而可以加快执行时间。 我们已经在会话级应用了CBO并启用了矢量化。

2 个答案:

答案 0 :(得分:0)

也许这样的临时表查询可以更快。同时,您应该检查这些表所必需的索引。

SELECT N.CONS_ID AS CUST_ID,
COUNT(Distinct(Isnull(M.CO_ID,1))) AS CUST_COUNT
Into #Temp
FROM CTS_VIEW.CNSLD_CREDIT_SUM M
INNER JOIN  CTS_VIEW.LEGACY_CODE_XREF  N
ON M.EE_ID = N.EE_GBL_IND
WHERE M.PROD_DT >= DATE '2018-12-31'
GROUP BY N.CONS_ID

select CUST_ID, CUST_COUNT from 
#Temp
where CUST_COUNT>1;

答案 1 :(得分:0)

我认为CTE或Common Table Expressions可以满足这些目的-

SELECT CUST_ID,CUST_COUNT FROM (
WITH M_RAW_CTE AS 
(SELECT CO_ID,EE_ID,PROD_DT FROM CTS_VIEW.CNSLD_CREDIT_SUM),
M_CTE AS 
(SELECT * FROM M_RAW_CTE WHERE PROD_DT >= DATE '2018-12-31'), 
N_CTE AS 
(SELECT CONS_ID,EE_GBL_IND FROM LEGACY_CODE_XREF) 

SELECT N_CTE.CONSID AS CUST_ID,
COUNT(DISTINCT(ISNULL(M.CO_ID,1))) AS CUST_COUNT
INNER JOIN N_CTE ON M_CTE.EE_ID = N_CTE.EE_GBL_IND
GROUP BY N_CTE.CONS_ID)
WHERE CUST_COUNT > 1;

使用CTE的背后概念是提高DDL中数据的可重用性。