我有一个登录表,其中包含客户的ID和登录时间的时间戳(customerid
,timestamp
)。
我希望在六十分钟内获得至少三次次登录的所有客户ID。顺便说一句,登录表是巨大。自我加入不是一种选择。
例如:
customer id | timestamp
1 | 2016-08-16 00:00
2 | 2016-08-16 00:00
3 | 2016-08-16 00:00
1 | 2016-08-16 00:25
2 | 2016-08-16 01:25
3 | 2016-08-16 00:25
1 | 2016-08-16 00:47
2 | 2016-08-16 01:27
3 | 2016-08-16 02:25
3 | 2016-08-16 03:25
1 | 2016-08-16 01:05
对于此示例,查询应仅返回customerid
1.任何想法?
答案 0 :(得分:1)
使用rexTester进行测试:http://rextester.com/RMST24716(感谢TT.!)
CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL);
INSERT INTO loginTable (id, timestamp) values
( 1, '2016-08-16 00:00'),
( 2, '2016-08-16 00:00'),
( 3, '2016-08-16 00:00'),
( 1, '2016-08-16 00:25'),
( 2, '2016-08-16 01:25'),
( 3, '2016-08-16 00:25'),
( 1, '2016-08-16 00:47'),
( 2, '2016-08-16 01:27'),
( 3, '2016-08-16 02:25'),
( 3, '2016-08-16 03:25'),
( 1, '2016-08-16 01:05');
SELECT distinct a.id
FROM loginTable as a
join loginTable as b on a.id = b.id and a.timestamp < b.timestamp
join loginTable as c on b.id = c.id and b.timestamp < c.timestamp
where Datediff(minute, a.timestamp, c.timestamp) <= 60;
答案 1 :(得分:1)
希望它有所帮助(http://rextester.com/CTR13554):
SELECT a.id, a.timestamp, COUNT(DISTINCT b.timestamp)
FROM loginTable a
JOIN loginTable b ON a.id = b.id AND a.timestamp <= b.timestamp
JOIN loginTable c ON a.id = c.id AND a.timestamp <= c.timestamp
WHERE 1=1
AND ABS(DATEDIFF(minute,a.timestamp,b.timestamp)) <= 60
AND ABS(DATEDIFF(minute,a.timestamp,c.timestamp)) <= 60
GROUP BY a.id, a.timestamp
HAVING COUNT(DISTINCT b.timestamp) >= 3
btw,在您的示例中,客户1在一小时内登录3次两次:[00:00; 00:25; 00:47]和[00:25; 00:47; 01:05]
这里是快速测试上述代码的代码:
CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL)
INSERT INTO loginTable (id, timestamp)
SELECT 1, '2016-08-16 00:00'
UNION SELECT 2, '2016-08-16 00:00'
UNION SELECT 3, '2016-08-16 00:00'
UNION SELECT 1, '2016-08-16 00:25'
UNION SELECT 2, '2016-08-16 01:25'
UNION SELECT 3, '2016-08-16 00:25'
UNION SELECT 1, '2016-08-16 00:47'
UNION SELECT 2, '2016-08-16 01:27'
UNION SELECT 3, '2016-08-16 02:25'
UNION SELECT 3, '2016-08-16 03:25'
UNION SELECT 1, '2016-08-16 01:05'
答案 2 :(得分:1)
我只能在rextester上进行测试,而对于mssql,以下似乎也有效:希望你的mssql版本也支持分析函数。
在这种情况下,不需要自联接,并且只扫描一次表。
CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL)
INSERT INTO loginTable (id, timestamp)
SELECT 1, '2016-08-16 00:00'
UNION SELECT 2, '2016-08-16 00:00'
UNION SELECT 3, '2016-08-16 00:00'
UNION SELECT 1, '2016-08-16 00:25'
UNION SELECT 2, '2016-08-16 01:25'
UNION SELECT 3, '2016-08-16 00:25'
UNION SELECT 1, '2016-08-16 00:47'
UNION SELECT 2, '2016-08-16 01:27'
UNION SELECT 3, '2016-08-16 02:25'
UNION SELECT 3, '2016-08-16 03:25'
UNION SELECT 1, '2016-08-16 01:05';
select id, min_t, max_t from (
select id,
min(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as min_t,
max(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as max_t,
count(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as num_t
from loginTable
) ts_data
where ABS(DATEDIFF(minute,min_t,max_t)) <= 60 and num_t=3;
(感谢@Salvador共享了一些测试脚本)
<强>解释强> 这里的想法是通过时间戳扫描一次登录表,并在最后三次出现的每个id中保留在内存中(当前包括)。 如果三个时间戳的最小时间戳和最大时间戳在60分钟内发生,我们几乎得到了结果。
最后,我们必须管理一个角落案例&#34;: 当我们遇到客户的第一次或第二次登录时,我们可以在60分钟的时间内同时拥有最小和最大时间戳(在首次登录的情况下,它们将是相同的)。
然而,他们不满足OP要求(他谈到3个不同的登录)所以我们必须计算登录次数并确保它们是3(num_t=3
)
<强>被修改强> (再次感谢@Salvador的警告)
在第一个版本中有一个错误,我在Windows规范中说过&#34; 3之前的行#34;实际上我不得不看三行,但是当前的一行被包括在内,所以我应该在前面的#2之间设置&#34;行。
答案 3 :(得分:0)
使用自我加入
{{1}}
答案 4 :(得分:0)
此查询获取在60分钟内至少登录三次的客户:
SELECT customerid FROM
(SELECT customerid, count(*) as loginnumber FROM LoginTable
GROUP BY customerid
WHERE [timestamp] > DATEADD(minute, -60, GetDate()) ) LT
WHERE loginnumber >= 3
答案 5 :(得分:0)
我将我的Pqsql解决方案改编为mssql: 你可以在http://rextester.com/CBPW42897
看到结果WITH tbl AS (
SELECT id
, IIF( DATEDIFF(minute,
lag(ts, 1) OVER (PARTITION BY id ORDER BY ts asc ),
ts )<=60,
1, 0) as freq60
FROM loginTable
)
SELECT id FROM tbl
GROUP BY tbl.id HAVING SUM(freq60) >=3
ORDER BY tbl.id
我喜欢MSSQL的便捷函数IIF和DATEDIFF,但每次都指定相同的窗口有点尴尬。
以下代码适用于PgSQL,
with tbl as (
select cust_id
,case when extract(epoch from (ts - lag(ts, 1) over w) ) < 3600 then 1
else 0
end as freq60
from loginTable
window w as (partition by id order by ts asc )
)
select cust_id
from tbl
group by tbl.cust_id having sum(freq60) >=3
order by tbl.cust_id
这个想法很简单。按客户ID创建窗口框架,按时间排序成员行。让每一行的时间戳减去前一行的时间戳来获取间隔,如果间隔在60m内,则返回1,否则为0,然后对结果进行聚合。返回其freq&gt; = 3的id 在窗口函数中只进行了一次排序。在处理成千上万条记录时,速度比自连接快几个。
id freq60 prev_ts ts intval
1 0 null 2016-08-16 00:00:00
1 1 2016-08-16 00:00:00 2016-08-16 00:25:00 25
1 1 2016-08-16 00:25:00 2016-08-16 00:47:00 22
1 1 2016-08-16 00:47:00 2016-08-16 01:05:00 18
答案 6 :(得分:0)
找到不跨越小时边界的简单方法是:
select
id,
datepart(yy,timestamp) as yy,
datepart(mm,timestamp) as mm,
datepart(dd,timestamp) as dd,
datepart(hh,timestamp) as hh,
count(*)
from
logintable
group by
id,
datepart(yy,timestamp),
datepart(mm,timestamp),
datepart(dd,timestamp),
datepart(hh,timestamp)
having
count(*) >= 3
如果您的桌子非常大,您可能会将其击倒给每天至少登录三次的客户,然后自行加入。它仍然会错过跨越日期的登录,但这是一个简单的解决方案,可以让您在处理更复杂的工作时前进。