Redshift-联接两个表时缺少最新日期

时间:2020-01-12 01:13:28

标签: sql amazon-redshift

我有两个表(称为A和B表);

表-数据仅包括最近1个月的数据。 表-B数据存储您拥有的所有数据。

|user | table_A_date | amount_table_A|
|-----| ------------ | ------------- |
| A   |2019-11-30    |1111.0         |
| A   |2019-12-02    |1111.0         |
| A   |2019-12-05    |1111.0         |
| A   |2019-12-09    |1111.0         |


|user | table_B_date | amount_table_B|
|-----| ------------ | ------------- |
| A   |2019-11-25    |1111.0         |
| A   |2019-12-02    |1111.0         |
| A   |2019-12-05    |1111.0         |
| A   |2019-12-10    |1111.0         |

我需要找到这两个表日期之间的差异,但是当我离开联接两个表时,我的日期为空:

|user     | table_A_date |  table_B_date | amount_table_A|
| ------- | -------      | -------       | -----   |
| A       |2019-11-30    |   Null        |1111.0   |
| A       |2019-12-02    |2019-12-02     |1111.0   |
| A       |2019-12-05    |2019-12-05     |1111.0   |
| A       |2019-12-09    |    Null       |1111.0   |

我将使用last_value over ()函数,但仍然缺少第一个null值。如何存储每个用户的上一个最近值(for user A 2019-11-25

1 个答案:

答案 0 :(得分:1)

您可以将full joinlag() / last_value()一起使用,然后进行过滤:

select ab.*
from (select coalesce(a.user, b.user) as user,
             a.date as a_date, a.amount as a_amount,
             coalesce(b.date,
                      lag(b.date ignore nulls) over (partition by user order by b.date)
                         ) as b_date,
             coalesce(b.amount,
                      lag(b.amount ignore nulls) over (partition by user order by b.date)
                     ) as b_amount
      from a full join
           b
           on a.user = b.user and a.date = b.date
     ) ab
where a_date is not null;