在不同条件下与同一张表连接两次

时间:2019-02-28 22:08:20

标签: sql hiveql

我有一个表格,其中包含每天一组产品中用户活动的摘要。理论上,每个<UserId, Product, Client, Date>元组只有一行,因为此表是由GROUP BY生成的。我们称之为UserActivity

UserActivity:

UserId | Product | Client     | Date
------------------------------------------
John   | Bank    | Mobile App | 2019-02-28
John   | Bank    | Desktop App| 2019-02-28
Sally  | Gym     | Web App    | 2019-02-28

我有另一个表,我们称它为FirstLastSeen,这里有UserId,它们何时分别使用ProductClient

FirstLastSeen:

UserId | Product | Client     | Date
------------------------------------------
John   | Bank    | Mobile App | 2019-01-01
John   | Bank    | Desktop App| 2019-02-28
Sally  | Gym     | Web App    | 2019-02-28

我想计算用户是否是Product的“新手”,以及他们是否是Client的新手。这意味着他们第一次使用此Product的日期等于Date,而他们第一次使用此ProductClient的日期等于Date 。因此,这样的表:

UserId | Product | Client     | Date       | IsNewProduct | IsNewClient
-----------------------------------------------------------------
John   | Bank    | Mobile App | 2019-02-28 | False        | False        // Used on 01-01
John   | Bank    | Desktop App| 2019-02-28 | False        | True         // First time used this client was same day         
Sally  | Bank    | Mobile App | 2019-02-28 | True         | True         // First time we saw her in this product and client

一种方法是:

SELECT 
    UA.UserId, 
    UA.Product, 
    UA.Client, 
    CASE FLS.Date = UA.DATE THEN True ELSE FALSE END AS FirstSeenClient
FROM UserActivity as UA
LEFT JOIN FirstLastSeen AS FLS 
    ON  UA.UserId=FLS.UserId 
    AND UA.Product=FLS.Product 
    AND UA.Client=FLS.Client;

这会给我FirstSeenClient我想要的。保证有一行与它们的用法相对应。我不知道如何获得FirstSeenProduct。我怀疑答案是在子查询或窗口函数中,但是我不确定如何写,可能是MIN(Date) OVER (PARTITION BY UserId, Product)。我是Windowing Functions的新手,但是这可以使我最早了解Date中的用户Product,然后可以再执行一次SELECT检查{{1} }?窗口函数是否可以确保Date与要对其进行计算的行相同?

1 个答案:

答案 0 :(得分:0)

到目前为止,您的查询看起来不错。要检查这是否是客户第一次使用产品,可以使用ROW_NUMBER()为具有相同UserId和相同Product的记录组中的每个记录分配一个等级,由Date排序。当行号为1时,您就知道您正在处理新产品。

SELECT 
    UA.UserId, 
    UA.Product, 
    UA.Client, 
    CASE 
        WHEN ROW_NUMBER() OVER(PARTITION BY UA.UserId, UA.Product ORDER BY UA.Date) = 1 
        THEN true 
        ELSE false 
    END AS IsNewProduct,
    CASE 
        WHEN UA.Date = FLS.Date
        THEN true
        ELSE false
    END AS IsNewClient
FROM UserActivity as UA
LEFT JOIN FirstLastSeen AS FLS 
    ON UA.UserId   = FLS.UserId 
    AND UA.Product = FLS.Product 
    AND UA.Client  = FLS.Client;

demo on DB Fiddle 及其示例数据将返回:

| UserId | Product | Client      | IsNewProduct | IsNewClient |
| ------ | ------- | ----------- | ------------ | ----------- |
| John   | Bank    | Mobile App  | 1            | 0           |
| John   | Bank    | Desktop App | 0            | 1           |
| Sally  | Gym     | Web App     | 1            | 1           |