在一组天内(特定日期和前2天)获取不同的ID计数

时间:2018-06-06 09:49:16

标签: mysql sql pyspark

我的表中有4列,rii,uii,rdi和udi。如下所示:

+----------+------+----------+------+
|       rdi|   rii|       udi|   uii|
+----------+------+----------+------+
|2002-02-06|1376.Q|2002-02-06|1376.Q|
|2002-02-28|1376.Q|2002-02-28|1376.Q|
|2002-03-06|1376.Q|2002-03-06|1376.Q|
|2002-02-01|1792.T|2002-02-01|1792.T|
|2002-03-07|1802.T|2002-03-07|1802.T|
|2002-03-08|1802.T|2002-03-08|1802.T|
|2002-04-03|1802.T|2002-04-03|1802.T|
|2002-03-07|1805.T|2002-03-07|1805.T|
|2002-02-18|1810.T|2002-02-18|1810.T|
|2002-03-22|1821.T|2002-03-22|1821.T|
|2002-02-27|1862.T|2002-02-27|1862.T|
|2002-04-11|1878.T|2002-04-11|1878.T|
|2002-04-18|1884.T|2002-04-18|1884.T|
|2002-02-27|1899.T|2002-02-27|1899.T|
|2002-03-11|1924.T|2002-03-11|1924.T|
|2002-02-05|1925.T|2002-02-05|1925.T|
|2002-01-23|1926.T|2002-01-23|1926.T|
|2002-03-19|1926.T|2002-03-19|1926.T|
|2002-01-25|1942.T|2002-01-25|1942.T|
|2002-01-31|1942.T|2002-01-31|1942.T|
+----------+------+----------+------+

我只想获得一个逻辑上的唯一rii的数量,如果我给予回顾为2然后它应该在一组天中给出唯一的rii数(在特定的rdi和之前2天从rdi)< / p>

所以我给了回顾2然后,我的结果应该是(对于rdi = 2002-02-06,它应该在rdi中找到唯一的rii(2002-02-06,2002-02-05,2002) -02-04))

+----------+-------------+----------+------+
|       rdi|          rii|       udi|   uii|
+----------+-------------+----------+------+
|2002-02-06|1376.Q,1925.T|2002-02-06|1376.Q|

我尝试使用以下查询,但未获得所需的o / p

select count(distinct uii) as u,
  rdi,
  (select count(distinct rii) from `mytable` where rdi between DATE_SUB(rdi, INTERVAL 2 DAY) AND rdi) as r
  from `mytable`
  group by rdi 
  order by rdi;

检查我的小提琴here

1 个答案:

答案 0 :(得分:3)

您可以使用LEFT JOIN将每条记录与前几天的记录相关联:

select t1.rdi, 
       group_concat(t2.rii) as rii, 
       t1.udi,
       count(distinct t2.uii)
from `mytable` as t1
left join `mytable` as t2
   on t2.rdi between DATE_SUB(t1.rdi, INTERVAL 2 DAY) AND t1.rdi
group by rdi 
order by rdi;

<强>输出:

enter image description here

Demo here