我要从表中检索记录频率/密度最高的时段。
假设我有一个这样的日志表:
datetime | action | username | highest_time_slot -------------------------------------------------- 2013-09-30 | update | username | 2013-12-15 | update | username | 2014-03-01 | update | username | * 2014-03-02 | update | username | * 2014-03-03 | update | username | * 2014-03-05 | update | username | * 2015-05-20 | update | username |
从此表中可以看到用户在2014-03-01和2014-03-05之间的时间段内以更高的频率运行。 是否有任何明智的方法可以回溯这个时段? 谢谢你的帮助!
答案 0 :(得分:2)
让我们从表定义和一些INSERT语句开始。这会在您更改问题之前反映您的数据。
create table log_test (
datetime date not null,
action varchar(15) not null,
username varchar(15) not null,
primary key (datetime, action, username)
);
insert into log_test values
('2013-09-30', 'update', 'username'),
('2013-12-15', 'update', 'username'),
('2014-03-01', 'update', 'username'),
('2014-03-02', 'update', 'username'),
('2014-03-03', 'update', 'username'),
('2014-03-05', 'update', 'username'),
('2015-05-20', 'update', 'username');
现在我们构建一个整数表。这种表在很多方面都很有用;我的有几百万行。 (有一些方法可以自动化插入语句。)
create table integers (
n integer not null,
primary key n
);
insert into n values
(0), (1), (2), (3), (4), (5), (6), (7), (8), (9),
(10), (11), (12), (13), (14), (15), (16), (17), (18), (19),
(20), (21), (22), (23), (24), (25), (26), (27), (28), (29),
(30), (31), (32), (33), (34), (35), (36), (37), (38), (39),
(40), (41), (42), (43), (44), (45), (46), (47), (48), (49);
此声明为我们提供了log_test的日期,以及我们想要查看的“窗口”中的天数。您需要select distinct
,因为可能有多个用户具有相同的日期。
select distinct datetime, t.n
from log_test
cross join (select n from integers where n between 10 and 40) t
order by datetime, t.n;
datetime n -- 2013-09-30 10 2013-09-30 11 2013-09-30 12 ... 2015-05-20 39 2015-05-20 40
我们可以将该结果用作派生表,并对其进行日期算术。
select datetime period_start, datetime + interval t2.n day period_end
from (
select distinct datetime, t.n
from log_test
cross join (select n from integers where n between 10 and 40) t ) t2
order by period_start, period_end;
period_start period_end -- 2013-09-30 2013-10-10 2013-09-30 2013-10-11 2013-09-30 2013-10-12 ... 2015-05-20 2015-06-28 2015-05-20 2015-06-29
这些间隔是一个; 2013-09-30至2013-10-10已有11天。我会把修理留给你。
下一个版本计算每个时期内“发生的事件”的数量。在您的情况下,由于最初编写的问题,我们只需要计算每个时期的行数。
select username, t3.period_start, t3.period_end, count(datetime) num_rows
from log_test
inner join (
select datetime period_start, datetime + interval t2.n day period_end
from (
select distinct datetime, t.n
from log_test
cross join (select n from integers where n between 10 and 40) t ) t2
order by period_start, period_end ) t3
on log_test.datetime between t3.period_start and t3.period_end
group by username, t3.period_start, t3.period_end
order by username, t3.period_start, t3.period_end;
username period_start period_end num_rows -- username 2013-09-30 2013-10-10 1 username 2013-09-30 2013-10-11 1 username 2013-09-30 2013-10-12 1 ... username 2014-03-01 2014-03-11 4 username 2014-03-01 2014-03-12 4 ... username 2015-05-20 2015-06-28 1 username 2015-05-20 2015-06-29 1
最后,我们可以运用一些算术魔法,并获得每个“窗口”的密度。
select username,
t3.period_start, t3.period_end, t3.n,
count(datetime) num_rows,
count(datetime)/t3.n density
from log_test
inner join (
select datetime period_start, t2.n, datetime + interval t2.n day period_end
from (
select distinct datetime, t.n
from log_test
cross join (select n from integers where n between 10 and 40) t ) t2
order by period_start, period_end ) t3
on log_test.datetime between t3.period_start and t3.period_end
group by username, t3.period_start, t3.period_end, t3.n
order by username, density desc;
username period_start period_end n num_rows density -- username 2014-03-01 2014-03-11 10 4 0.4000 username 2014-03-01 2014-03-12 11 4 0.3636 username 2014-03-01 2014-03-13 12 4 0.3333 ...
改进建议
您可能想要更改日期算术。就目前而言,这些查询只是将“n”天添加到测试表中的日期。但这意味着周期不会在间隙周围对称。例如,2014-03-01之后出现了很长的差距。就目前而言,我们不会尝试评估2014-03-01中结束的“窗口”的密度(一个“窗口”,它出现在距离之前的第一个值中)它)。这可能值得您考虑一下。