每组最大的时间戳

时间:2019-05-10 08:21:51

标签: mysql hive hiveql

我有一个相当简单的表,名为assignment;

CREATE TABLE IF NOT EXISTS assignment (h_id bigint, country string, p_name string)

每个h_id中有一行:

INSERT INTO TABLE assignment
  VALUES (19874, "France", "Example_Name"), 
         (21548, "France", "Example_Name"),
         (34569, "Germany", "Different_Name"),
         (47337, "Greece", "Another Name"),
         (54682, "Greece", "Example Name")
         (64963, "France", "Different Name");

我想将assignment连接到第二张表state

CREATE TABLE IF NOT EXISTS state (id bigint, xml_id bigint, datetime_in string, datetime_out string)

xml_id是带有h_id的连接键,并且每个h_id处于状态中有多行。

INSERT INTO TABLE state
  VALUES (1, 19874, "2014-04-03 10:38:31.0", "2017-11-30 10:45:00.0"), 
         (2, 19874, "2014-02-05 10:21:33.0", "2019-02-02 10:30:35.0"),
         (3, 19874, "2019-02-26 14:34:17.0", null),
         (4, 54682, "2019-03-07 14:43:34.0", null),
         (5, 54682, "2019-02-25 10:47:09.0", null)
         (6, 64963, "2019-02-06 12:50:05.0", "2019-05-04 16:15:08.0");

我想要的输出是来自assignment的数据和来自datetime_in的最新state

这是我尝试过的:

SELECT xml_id, datetime_in
    FROM (SELECT *,
        dense_rank() over (partition by xml_id ORDER BY datetime_in DESC) as rank
        FROM state s
        WHERE s.xml_id IN (SELECT a.h_id FROM assignment a)
    ) temp
    WHERE rank = 1

问题是,尽管分配了〜7k行,但我只能得到〜2k行。

如果我这样做:

SELECT COUNT(*) FROM state s
WHERE s.xml_id IN (SELECT a.h_id FROM assignment a)

我得到〜8k的结果。我希望如此,因为每个statea.h_id中有多行。但是,我不明白为什么尝试使用来自datetime_in的数据来获取最新的assignment时只能得到约2k行。

1 个答案:

答案 0 :(得分:0)

assignment表中不存在state表中的某些键,这两个表中似乎只有2K键。

还要检查此查询以查找仅在分配中存在的键:

SELECT a.h_id 
  FROM assignment a 
       left join (select distinct s.xml_id from state s) s on  a.h_id =  s.xml_id
 WHERE s.xml_id is null;

如果状态表可以包含许多具有相同时间戳的记录,那么density_rank将为具有相同时间戳xml_id的所有记录分配1。如果仅需要一条记录,则使用row_number()。如果即使状态表中不存在相应的记录,也需要分配所有记录,请使用left join。如果只需要两个表中都存在的键,则将left join替换为inner join

select a.*, s.*
  from assignment a
       left join (SELECT s.*,
                        row_number() over (partition by xml_id ORDER BY datetime_in DESC) as rn
                   FROM state s
                 ) s on s.xml_id = a.h_id and s.rn=1