如果时间戳有小时,则每天计算唯一值

时间:2020-04-29 15:02:49

标签: sql clickhouse

我有数据集:

timestamp               event   user
2020-04-28 20:07:55.503 log_in  john
2020-04-28 20:08:01.996 log_out john
2020-04-28 20:08:02.470 log_in  john
2020-04-28 20:08:03.996 log_out john
2020-04-28 20:08:05.729 log_failed  john
2020-04-29 10:06:45.683 log_in  mark
2020-04-29 10:08:58.299 password_change mark
2020-04-30 14:19:24.921 log_in  jeff
2020-04-30 14:20:31.266 log_out jeff
2020-04-30 14:21:44.438 create_new_user jeff
2020-04-30 14:22:44.455 create_new_user jeff

如何编写sql查询以每天计算所有唯一事件。对我而言,最不清楚的部分是时间戳中是否存在小时。所需的结果如下所示:

timestamp  count       
2020-04-28 3 
2020-04-29 2
2020-04-30 3

2 个答案:

答案 0 :(得分:1)

我认为Clickhouse的语法是:

select distinct toDate(timestamp), event
from t;

编辑:

如果要计算事件数,请使用count(distinct)

select toDate(timestamp), count(distinct event)
from t
group by toDate(timestamp);

答案 1 :(得分:0)

create table xx(timestamp DateTime64(3), event String, user String) Engine=Memory;
insert into xx values
('2020-04-28 20:07:55.503','log_in', 'john'),
('2020-04-28 20:08:01.996','log_out','john'),
('2020-04-28 20:08:02.470','log_in','john'),
('2020-04-28 20:08:03.996','log_out','john'),
('2020-04-28 20:08:05.729','log_failed','john'),
('2020-04-29 10:06:45.683','log_in','mark'),
('2020-04-29 10:08:58.299','password_change','mark'),
('2020-04-30 14:19:24.921','log_in','jeff'),
('2020-04-30 14:20:31.266','log_out','jeff'),
('2020-04-30 14:21:44.438','create_new_user','jeff'),
('2020-04-30 14:22:44.455','create_new_user','jeff')

SELECT
    toDate(timestamp) AS d,
    uniq(event)
FROM xx
GROUP BY d

┌──────────d─┬─uniq(event)─┐
│ 2020-04-28 │           3 │
│ 2020-04-29 │           2 │
│ 2020-04-30 │           3 │
└────────────┴─────────────┘