如何在按联合表分组之后选择一行?

时间:2019-07-05 04:07:59

标签: sql hiveql

我需要从两个表中选择最新的行,两个表具有相同的架构

表A和表B是相同的架构,如下所示:

表A:

user_id, time_stamp, order_id

1,20190101,100

2,20190103,201

3,20190102,300

5,20180209,99

表B:

user_id, time_stamp, order_id

1,20190102,101

2,20190101,200

3,20190103,305

4,20190303,900

我希望输出是A联合B,然后选择用户的新行,按time_stamp排序:

输出应为:

1,20190102,101

2,20190103,201

3,20190103,305

4,20190303,900

5,20180209,99

如何编写此SQL?

4 个答案:

答案 0 :(得分:1)

您可以编写如下示例查询demo

with unionedTable as (
select * from tableA
union
select * from tableB)
,newerUsersTable as (
select  distinct on (u.user_id)u.*
from unionedTable u
order by u.user_id, u.time_stamp desc

)select * from newerUsersTable

答案 1 :(得分:0)

主要思想是在两个表之间使用FULL OUTER JOIN,然后使用UNION [ALL]返回数据集。因此,请考虑以下带有SELECT子句的WITH语句:

with a( user_id, time_stamp, order_id ) as
(
 select 1,20190101,100 union all
 select 2,20190103,201 union all    
 select 3,20190102,300 union all    
 select 5,20180209,99  
), b( user_id, time_stamp, order_id ) as
(
 select 1,20190102,101 union all
 select 2,20190101,200 union all    
 select 3,20190103,305 union all    
 select 4,20190303,900 
), c as
(
select a.user_id as user_id_a, a.time_stamp as time_stamp_a, a.order_id as order_id_a,
       b.user_id as user_id_b, b.time_stamp as time_stamp_b, b.order_id as order_id_b
  from a full outer join b
    on a.user_id = b.user_id 
), d as
(
select user_id_a, time_stamp_a, order_id_a  
  from c
 where coalesce(time_stamp_b,time_stamp_a) <= time_stamp_a 
union all 
select user_id_b, time_stamp_b, order_id_b 
  from c
 where time_stamp_b >= coalesce(time_stamp_a,time_stamp_b)
)
select user_id_a as user_id, time_stamp_a as time_stamp, order_id_a as order_id
  from d
 order by user_id_a;

user_id time_stamp  order_id
1       20190102    101
2       20190103    201
3       20190103    305
4       20190303    900
5       20180209    99

Demo

答案 2 :(得分:0)

使用分组依据(user_id)显示所有user_id

使用max(time_stamp)获取用户的新行

SELECT aa.* from (select * from a union SELECT * from b ) aa 
JOIN 
(select user_id,max(time_stamp) as new_time
from (select * from a union SELECT * from b ) u
group by u.user_id) bb
on bb.new_time=aa.time_stamp and bb.user_id=aa.user_id
order by aa.user_id;

SQL Fiddle

答案 3 :(得分:0)

我会简单地做:

select user_id, time_stamp, order_id
from (select ab.*,
             row_number() over (partition by user_id order by time_stamp desc) as seqnum
      from (select a.* from a union all
            select b.* from b
           ) ab
     ) ab
where seqnum = 1;