更好的SQL方法来获取最新的客户记录?

时间:2017-07-08 10:04:12

标签: sql hiveql impala

想象一下,我们有两个表:客户和购买。 购买有customerID,purchaseDateTime等。

为hive或impala SQL中的所有客户选择最近一次购买的最佳方法是什么?

我看过这个问题:

With recent as (
    select customerID, max(purchaseDateTime)     as dt
    from purchases group by customerID
)
Select *
from customer c
join recent r
    on c.customerID = r.customerID
join purchases p
    on r.customerId = p.customerid and
       p.purchaseDateTime = dt

似乎没那么高效......

1 个答案:

答案 0 :(得分:1)

我会使用row_number()

Select c.*, p.*
from customer c join
     (select p.*,
             row_number() over (partition by p.customerid order by p.purchaseDateTime desc) as seqnum
      from purchases p
     ) p
     on c.customerId = p.customerid and p.purchaseDateTime = dt
where seqnum = 1;

row_number()是ANSI标准功能,因此它是标准SQL。一般来说,它应该比明确的group byjoin更快。

一个区别是 - 如果是关系 - 这会返回一行。您的查询将返回多行。如果您需要此行为,请将row_number()更改为rank()