具有相同列名的合并数据

时间:2020-09-16 08:51:31

标签: sql hive left-join union hiveql

抱歉为标题混乱,我不确定用什么最好的方式来表达它。我有两个每日桌子,第一个看起来像这样:

| yyyy_mm_dd | x_id | feature     | impl_status   |
|------------|------|-------------|---------------|
| 2020-08-18 | 1    | Basic       | first_contact |
| 2020-08-18 | 1    | Last Minute | first_contact |
| 2020-08-18 | 1    | Geo         | first_contact |
| 2020-08-18 | 2    | Basic       | implemented   |
| 2020-08-18 | 2    | Last Minute | first_contact |
| 2020-08-18 | 2    | Geo         | no_contact    |
| 2020-08-18 | 3    | Basic       | no_contact    |
| 2020-08-18 | 3    | Last Minute | no_contact    |
| 2020-08-18 | 3    | Geo         | implemented   |

第二个看起来像这样:

| yyyy_mm_dd | x_id | payment |
|------------|------|---------|
| 2020-08-18 | 1    | 0       |
| 2020-08-18 | 2    | 0       |
| 2020-08-18 | 3    | 1       |
| 2020-08-19 | 1    | 0       |
| 2020-08-19 | 2    | 0       |
| 2020-08-19 | 3    | 1       |

我想建立一个查询,其中第一个表中payment变成feature。由于first_contact是布尔值(1/0),因此不会有payment状态。这是我尝试过的:

select
    yyyy_mm_dd,
    t1.x_id
    t1.impl_status
from
    schema.table1 t1
left join(
    select
        yyyy_mm_dd,
        x_id,
        'payment' as feature,
        if(payment=1, 'implemented', 'no_contact') as impl_status
    from
         schema.table2
 ) t2 on t2.yyyy_mm_dd = t1.yyyy_mm_dd and t2.x_id = t1.x_id

但是,由于含糊不清,我将需要选择t1.impl_statust2.impl_status。两列未合并。

考虑到这一点,预期输出将如下所示:

| yyyy_mm_dd | x_id | feature     | impl_status   |
|------------|------|-------------|---------------|
| 2020-08-18 | 1    | Basic       | first_contact |
| 2020-08-18 | 1    | Last Minute | first_contact |
| 2020-08-18 | 1    | Geo         | first_contact |
| 2020-08-18 | 1    | Payment     | no_contact    |
| 2020-08-18 | 2    | Basic       | implemented   |
| 2020-08-18 | 2    | Last Minute | first_contact |
| 2020-08-18 | 2    | Geo         | no_contact    |
| 2020-08-18 | 2    | Payment     | no_contact    |
| 2020-08-18 | 3    | Basic       | no_contact    |
| 2020-08-18 | 3    | Last Minute | no_contact    |
| 2020-08-18 | 3    | Geo         | implemented   |
| 2020-08-18 | 3    | Payment     | implemented   |
| 2020-08-19 ...
 ...

1 个答案:

答案 0 :(得分:1)

您可以使用union all

select yyyy_mm_dd, x_id, feature, impl_status from table1 t1
union all
select yyyy_mm_dd, x_id, 'Payment', case when payment = 0 then 'no_contact' else 'implemented' end from table2
相关问题