我有两组来自外部来源的数据 - 客户的购买日期和客户的上次电子邮件点击/开放日期。它分别存储在两个表PURCHASE_INTER和ACTIVITY_INTER表中。购买数据是多个,我需要选择上次购买日期。但活动数据对每个客户都是独一无二的。数据彼此独立,并且可能不存在其他数据集。我们在下面写了一个查询,它结合了两个表,根据person_id对它们进行分组,这是来自外部来源的客户的ID并获取最新的日期,加入我们的客户表以获取客户电子邮件,然后再次加入另一个表最终存储此数据的目的是为了知道它是插入还是更新操作。您能否建议我如何提高此查询的性能。它非常慢,耗时超过10小时。 PURCHASE_INTER和ACTIVITY_INTER表中有数百万条记录。
SELECT INTER.*, C.ID AS CUSTOMER_ID, C.EMAIL AS CUSTOMER_EMAIL, LSI.ID AS INTERACTION_ID, ROW_NUMBER() OVER (ORDER BY PERSON_ID ASC) AS RN FROM (
SELECT PERSON_ID AS PERSON_ID,
MAX(LAST_CLICK_DATE) AS LAST_CLICK_DATE,
MAX(LAST_OPEN_DATE) AS LAST_OPEN_DATE,
MAX(LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
FROM (
SELECT ACT.PERSON_ID AS PERSON_ID,
ACT.LAST_CLICK_DATE AS LAST_CLICK_DATE,
ACT.LAST_OPEN_DATE AS LAST_OPEN_DATE,
NULL AS LAST_PURCHASE_DATE
FROM ACTIVITY_INTER ACT
WHERE ACT.JOB_ID = 77318317
UNION
SELECT PUR.PERSON_ID AS PERSON_ID,
NULL AS LAST_CLICK_DATE,
NULL AS LAST_OPEN_DATE,
PUR.LAST_PURCHASE_DATE AS LAST_PURCHASE_DATE
FROM PURCHASE_INTER PUR
WHERE PUR.JOB_ID = 77318317
) GROUP BY PERSON_ID
) INTER LEFT JOIN CUSTOMER C ON INTER.PERSON_ID = C.PERSON_ID
LEFT JOIN INTERACTION LSI ON C.ID = LSI.CUSTOMER_ID;
答案 0 :(得分:5)
您的查询建议使用以下索引:
ACTIVITY_INTER(JOB_ID, PERSON_ID, LAST_CLICK_DATE, LAST_OPEN_DATE)
PURCHASE_INTER(JOB_ID, PERSON_ID, LAST_PURCHASE_DATE)
CUSTOMER(PERSON_ID)
INTERACTION(CUSTOMER_ID)
(对于前两个索引,第一列比其他两个索引更重要,除非匹配数量非常大。)
另外,将UNION
更改为UNION ALL
。 UNION
会导致删除重复项的开销 - 这是不可能的(至少在两个子查询之间),因为每个子查询返回不同的列。
此外,您可能希望将第一个子查询替换为full outer join
:
SELECT COALESCE(a.PERSON_ID, p.PERSON_ID) as PERSON_ID,
a.LAST_CLICK_DATE, a.LAST_OPEN_DATE,p.LAST_PURCHASE_DATE
FROM (SELECT ACT.PERSON_ID AS PERSON_ID,
MAX(ACT.LAST_CLICK_DATE) AS LAST_CLICK_DATE,
MAX(ACT.LAST_OPEN_DATE) AS LAST_OPEN_DATE
FROM ACTIVITY_INTER ACT
WHERE ACT.JOB_ID = 77318317
GROUP BY ACT.PERSON_ID
) a FULL OUTER JOIN
(SELECT PUR.PERSON_ID AS PERSON_ID,
MAX(PUR.LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
FROM PURCHASE_INTER PUR
WHERE PUR.JOB_ID = 77318317
GROUP BY PER.PERSON_ID
) p
ON a.PERSON_ID = p.PERSON_ID
这为Oracle提供了更多优化选项,因为聚合是直接在表上完成的 - 使索引和更好的统计数据可用于处理。