我有一个相当简单的表,名为assignment
;
CREATE TABLE IF NOT EXISTS assignment (h_id bigint, country string, p_name string)
每个h_id
中有一行:
INSERT INTO TABLE assignment
VALUES (19874, "France", "Example_Name"),
(21548, "France", "Example_Name"),
(34569, "Germany", "Different_Name"),
(47337, "Greece", "Another Name"),
(54682, "Greece", "Example Name")
(64963, "France", "Different Name");
我想将assignment
连接到第二张表state
:
CREATE TABLE IF NOT EXISTS state (id bigint, xml_id bigint, datetime_in string, datetime_out string)
xml_id
是带有h_id
的连接键,并且每个h_id
处于状态中有多行。
INSERT INTO TABLE state
VALUES (1, 19874, "2014-04-03 10:38:31.0", "2017-11-30 10:45:00.0"),
(2, 19874, "2014-02-05 10:21:33.0", "2019-02-02 10:30:35.0"),
(3, 19874, "2019-02-26 14:34:17.0", null),
(4, 54682, "2019-03-07 14:43:34.0", null),
(5, 54682, "2019-02-25 10:47:09.0", null)
(6, 64963, "2019-02-06 12:50:05.0", "2019-05-04 16:15:08.0");
我想要的输出是来自assignment
的数据和来自datetime_in
的最新state
。
这是我尝试过的:
SELECT xml_id, datetime_in
FROM (SELECT *,
dense_rank() over (partition by xml_id ORDER BY datetime_in DESC) as rank
FROM state s
WHERE s.xml_id IN (SELECT a.h_id FROM assignment a)
) temp
WHERE rank = 1
问题是,尽管分配了〜7k行,但我只能得到〜2k行。
如果我这样做:
SELECT COUNT(*) FROM state s
WHERE s.xml_id IN (SELECT a.h_id FROM assignment a)
我得到〜8k的结果。我希望如此,因为每个state
在a.h_id
中有多行。但是,我不明白为什么尝试使用来自datetime_in
的数据来获取最新的assignment
时只能得到约2k行。
答案 0 :(得分:0)
assignment
表中不存在state
表中的某些键,这两个表中似乎只有2K键。
还要检查此查询以查找仅在分配中存在的键:
SELECT a.h_id
FROM assignment a
left join (select distinct s.xml_id from state s) s on a.h_id = s.xml_id
WHERE s.xml_id is null;
如果状态表可以包含许多具有相同时间戳的记录,那么density_rank将为具有相同时间戳xml_id的所有记录分配1。如果仅需要一条记录,则使用row_number()
。如果即使状态表中不存在相应的记录,也需要分配所有记录,请使用left join
。如果只需要两个表中都存在的键,则将left join
替换为inner join
:
select a.*, s.*
from assignment a
left join (SELECT s.*,
row_number() over (partition by xml_id ORDER BY datetime_in DESC) as rn
FROM state s
) s on s.xml_id = a.h_id and s.rn=1