带COUNT(DISTINCT)的mySQL窗口函数

时间:2019-07-31 15:21:57

标签: mysql

我想在MySQL中编写一个窗口函数,该函数提供30天的滚动,计算唯一ID。更准确地说,我的数据库每天有很多条目作为时间戳,包含许多不同的ID。我想每天计算连接多少个不同的ID,还要每天获取过去30天内在线的ID总数。

请考虑下表:

CREATE TABLE `my_database` (
  `timestamp` BIGINT(20) UNSIGNED NOT NULL,
  `id` VARCHAR(32) NOT NULL);

INSERT INTO my_database (timestamp,id) VALUES (CURDATE(),1);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 1 DAY),2);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 2 DAY),1);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 2 DAY),3);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 29 DAY),4);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 300 DAY),2);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 1000 DAY),5);

外观如下:

timestamp id
20190730    1
20190729    2
20190728    1
20190728    3
20190701    4
20181003    2
20161102    5

我想要得到的结果如下:

date              count_day     count_30day

2019-07-30            1               4
2019-07-29            1               4
2019-07-28            2               3
2019-07-01            1               1
2018-10-03            1               1
2016-11-02            1               1

我不知道如何获取count_30day列。到目前为止,我已经写了以下内容:

SELECT DATE(a.`timestamp`) AS 'date',
    COUNT(DISTINCT a.id) AS 'count_day',
    COUNT(DISTINCT a.id) OVER (ORDER BY DATE(a.`timestamp`) ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) AS 'count_30day'
  FROM my_database AS a
 GROUP 
    BY DATE(a.`timestamp`)
 ORDER 
    BY DATE(a.`timestamp`) DESC

但是对于count_30day列不起作用。我一直在寻找其他问题,据我所知,窗口函数的文档和语法似乎是正确的,但显然是不正确的,因为这不起作用。如何正确编写窗口函数?除了COUNT(DISTINCT)之外,还有其他更好的方法吗?谢谢!

1 个答案:

答案 0 :(得分:0)

ROWS PRECEDING与行数有关,与天数无关

您需要一个子查询:

SQL DEMO

SELECT DATE(a.`timestamp`) AS 'date',
    COUNT(DISTINCT a.id) AS 'count_day',
    MAX( (SELECT COUNT(DISTINCT ID) 
          FROM my_database db2 
          WHERE db2.timestamp between DATE_SUB(a.timestamp, INTERVAL 30 DAY)
                                  and a.timestamp
          )
       ) as count30
  FROM my_database AS a
 GROUP 
    BY DATE(a.`timestamp`)
 ORDER 
    BY DATE(a.`timestamp`) DESC