我在Hive上有3个表: - 日历表(包含给定时期的所有日期) - 客户表 - 客户的交易清单
我需要加入这些,以获得给定日期,所有客户及其最后一笔交易,直到该日期为止。只有在该日期之前没有交易(我的意思是直到当前日历记录的最后一笔交易),最后一笔交易才应为空。
日历示例:
+----------+
|date |
+----------+
|2017-06-01|
|2017-06-02|
|2017-06-03|
|2017-06-04|
|2017-06-05|
|2017-06-06|
|2017-06-07|
|2017-06-08|
|2017-06-09|
|2017-06-10|
+----------+
客户样本:
+------------+
|customer_id |
+------------+
|11544049690 |
|15506698252 |
|67015354024 |
|43622453087 |
|509 |
|42859528435 |
|506 |
|10669246896 |
|33355892704 |
|500 |
+------------+
交易样本:
+------------+----------+
|customer_id |trx_date |
+------------+----------+
|43622453087 |2018-05-30|
|509 |2017-10-04|
|509 |2018-01-09|
|509 |2017-11-07|
|509 |2018-01-30|
|506 |2017-10-04|
|506 |2017-12-21|
|506 |2017-11-07|
|506 |2017-11-07|
|500 |2017-10-04|
+------------+----------+
结果或多或少会像这样:
+----------+------------+--------------+
|date |customer_id |last_trx_date |
+----------+------------+--------------+
|2017-10-04|11544049690 | |
|2017-10-04|15506698252 | |
|2017-10-04|67015354024 | |
|2017-10-04|43622453087 | |
|2017-10-04|509 |2017-10-04 |
|2017-10-04|42859528435 | |
|2017-10-04|506 |2017-10-04 |
|2017-10-04|10669246896 | |
|2017-10-04|33355892704 | |
|2017-10-04|500 |2017-10-04 |
|2017-10-05|11544049690 | |
|2017-10-05|15506698252 | |
|2017-10-05|67015354024 | |
|2017-10-05|43622453087 | |
|2017-10-05|509 |2017-10-04 |
|2017-10-05|42859528435 | |
|2017-10-05|506 |2017-10-04 |
|2017-10-05|10669246896 | |
|2017-10-05|33355892704 | |
|2017-10-05|500 |2017-10-04 |
|2017-10-06|11544049690 | |
|2017-10-06|15506698252 | |
|2017-10-06|67015354024 | |
|2017-10-06|43622453087 | |
|2017-10-06|509 |2017-10-04 |
|2017-10-06|42859528435 | |
|2017-10-06|506 |2017-10-04 |
|2017-10-06|10669246896 | |
|2017-10-06|33355892704 | |
|2017-10-06|500 |2017-10-04 |
.
.
.
|2017-11-07|11544049690 | |
|2017-11-07|15506698252 | |
|2017-11-07|67015354024 | |
|2017-11-07|43622453087 | |
|2017-11-07|509 |2017-11-07 |
|2017-11-07|42859528435 | |
|2017-11-07|506 |2017-11-07 |
|2017-11-07|10669246896 | |
|2017-11-07|33355892704 | |
|2017-11-07|500 |2017-10-04 |
+----------+------------+--------------+
最后一次尝试就像这样: 这是最后一次尝试:
SELECT
cal.date as calendar_date,
c.customer_id,
to_date(trx.tstamp) as trx_date,
max(to_date(trx.tstamp)) over (
order by trx.date, trx.customer_id rows unbounded preceding) as last_trx
FROM
calendartable cal
LEFT JOIN customer t1
LEFT JOIN transactions t2
ON (c.customer_id == trx.customer_id)
WHERE to_date(cal.date) <= current_date or cal.date is null
答案 0 :(得分:0)
需要交叉连接为每个客户的日历表中的每个日期生成行。然后,带有聚合的left join
将产生所需的结果。
SELECT cal.date as calendar_date,
cst.customer_id,
max(to_date(trx.tstamp)) as last_trx
FROM calendartable cal
CROSS JOIN customer cst
LEFT JOIN transactions trx ON cst.customer_id = trx.customer_id AND trx.tstamp <= cal.dt
WHERE to_date(cal.date) <= current_date
GROUP BY cal.date,cst.customer_id