总和(案例)在蜂巢中非常低效

时间:2014-01-15 22:54:00

标签: sql hadoop hive

我有这个查询,其中两个表都小于1MB。 还需要很长时间。它与hive有关,你不应该使用SUM(CASE)吗?

SELECT
a.weeknum,
SUM(CASE WHEN a.payment_action_type = 'chargeback'  THEN b.chargeback_multiplier*a.num_accounts
 WHEN a.payment_action_type = 'refund' THEN b.refund_multiplier*a.num_accounts END)
as  num_reversals
FROM
hipal2_1921596 a
JOIN  vivekkaul_ads_weekly_arrival_curve b
ON b.weeks_elapsed = a.actual_weeks_elapsed  
GROUP BY a.weeknum

1 个答案:

答案 0 :(得分:0)

你有适当的索引吗?

尝试以下操作,但您可能需要检查是否有其他与此重叠的索引

CREATE NONCLUSTERED INDEX [IX_weeks_elapsed] ON vivekkaul_ads_weekly_arrival_curve(weeks_elapsed)      
INCLUDE (chargeback_multiplier, refund_multiplier)

CREATE NONCLUSTERED INDEX [IX_hipal2_1921596] ON hipal2_1921596 (actual_weeks_elapsed  )
INCLUDE  (payment_action_type, num_accounts)