选择在一个时间间隔内至少登录三次的所有客户

时间:2016-10-24 18:13:27

标签: sql-server tsql datetime sql-server-2014

我有一个登录表,其中包含客户的ID和登录时间的时间戳(customeridtimestamp)。

我希望在六十分钟内获得至少三次次登录的所有客户ID。顺便说一句,登录表是巨大。自我加入不是一种选择。

例如:

customer id | timestamp
1           | 2016-08-16 00:00
2           | 2016-08-16 00:00
3           | 2016-08-16 00:00
1           | 2016-08-16 00:25
2           | 2016-08-16 01:25
3           | 2016-08-16 00:25
1           | 2016-08-16 00:47
2           | 2016-08-16 01:27
3           | 2016-08-16 02:25
3           | 2016-08-16 03:25
1           | 2016-08-16 01:05

对于此示例,查询应仅返回customerid 1.任何想法?

7 个答案:

答案 0 :(得分:1)

使用rexTester进行测试:http://rextester.com/RMST24716(感谢TT.!)

 CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL);

 INSERT INTO loginTable (id, timestamp) values 
( 1, '2016-08-16 00:00'),
( 2, '2016-08-16 00:00'),
( 3, '2016-08-16 00:00'),
( 1, '2016-08-16 00:25'),
( 2, '2016-08-16 01:25'),
( 3, '2016-08-16 00:25'),
( 1, '2016-08-16 00:47'),
( 2, '2016-08-16 01:27'),
( 3, '2016-08-16 02:25'),
( 3, '2016-08-16 03:25'),
( 1, '2016-08-16 01:05');


SELECT distinct a.id
FROM loginTable as a 
join loginTable as b on a.id = b.id and a.timestamp < b.timestamp
join loginTable as c on b.id = c.id and b.timestamp < c.timestamp
where Datediff(minute, a.timestamp, c.timestamp) <= 60;

答案 1 :(得分:1)

希望它有所帮助(http://rextester.com/CTR13554):

SELECT a.id, a.timestamp, COUNT(DISTINCT b.timestamp)
FROM loginTable a
JOIN loginTable b ON a.id = b.id AND a.timestamp <= b.timestamp
JOIN loginTable c ON a.id = c.id AND a.timestamp <= c.timestamp
WHERE 1=1
  AND ABS(DATEDIFF(minute,a.timestamp,b.timestamp)) <= 60
  AND ABS(DATEDIFF(minute,a.timestamp,c.timestamp)) <= 60
GROUP BY a.id, a.timestamp
HAVING COUNT(DISTINCT b.timestamp) >= 3

btw,在您的示例中,客户1在一小时内登录3次两次:[00:00; 00:25; 00:47]和[00:25; 00:47; 01:05]

这里是快速测试上述代码的代码:

CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL)

INSERT INTO loginTable (id, timestamp)
      SELECT 1, '2016-08-16 00:00'
UNION SELECT 2, '2016-08-16 00:00'
UNION SELECT 3, '2016-08-16 00:00'
UNION SELECT 1, '2016-08-16 00:25'
UNION SELECT 2, '2016-08-16 01:25'
UNION SELECT 3, '2016-08-16 00:25'
UNION SELECT 1, '2016-08-16 00:47'
UNION SELECT 2, '2016-08-16 01:27'
UNION SELECT 3, '2016-08-16 02:25'
UNION SELECT 3, '2016-08-16 03:25'
UNION SELECT 1, '2016-08-16 01:05'

答案 2 :(得分:1)

我只能在rextester上进行测试,而对于mssql,以下似乎也有效:希望你的mssql版本也支持分析函数。

在这种情况下,不需要自联接,并且只扫描一次表。

 CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL)

 INSERT INTO loginTable (id, timestamp)
       SELECT 1, '2016-08-16 00:00'
 UNION SELECT 2, '2016-08-16 00:00'
 UNION SELECT 3, '2016-08-16 00:00'
 UNION SELECT 1, '2016-08-16 00:25'
 UNION SELECT 2, '2016-08-16 01:25'
 UNION SELECT 3, '2016-08-16 00:25'
 UNION SELECT 1, '2016-08-16 00:47'
 UNION SELECT 2, '2016-08-16 01:27'
 UNION SELECT 3, '2016-08-16 02:25'
 UNION SELECT 3, '2016-08-16 03:25'
 UNION SELECT 1, '2016-08-16 01:05';

 select id,  min_t, max_t from (
 select id,
         min(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as min_t, 
         max(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as max_t,
         count(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as num_t
   from loginTable
 ) ts_data    
  where ABS(DATEDIFF(minute,min_t,max_t)) <= 60 and num_t=3;

(感谢@Salvador共享了一些测试脚本)

<强>解释 这里的想法是通过时间戳扫描一次登录表,并在最后三次出现的每个id中保留在内存中(当前包括)。 如果三个时间戳的最小时间戳和最大时间戳在60分钟内发生,我们几乎得到了结果。

最后,我们必须管理一个角落案例&#34;: 当我们遇到客户的第一次或第二次登录时,我们可以在60分钟的时间内同时拥有最小和最大时间戳(在首次登录的情况下,它们将是相同的)。

然而,他们不满足OP要求(他谈到3个不同的登录)所以我们必须计算登录次数并确保它们是3(num_t=3

<强>被修改 (再次感谢@Salvador的警告)

在第一个版本中有一个错误,我在Windows规范中说过&#34; 3之前的行#34;实际上我不得不看三行,但是当前的一行被包括在内,所以我应该在前面的#2之间设置&#34;行。

答案 3 :(得分:0)

使用自我加入

{{1}}

答案 4 :(得分:0)

此查询获取在60分钟内至少登录三次的客户:

SELECT customerid FROM 
(SELECT customerid, count(*) as loginnumber FROM LoginTable
 GROUP BY customerid
 WHERE [timestamp] > DATEADD(minute, -60, GetDate()) ) LT
WHERE loginnumber >= 3

答案 5 :(得分:0)

我将我的Pqsql解决方案改编为mssql: 你可以在http://rextester.com/CBPW42897

看到结果
WITH tbl AS (
  SELECT id 
    , IIF( DATEDIFF(minute, 
                    lag(ts, 1) OVER (PARTITION BY id ORDER BY ts asc ), 
                    ts )<=60, 
           1, 0) as freq60
  FROM loginTable
)
SELECT id FROM tbl
  GROUP BY tbl.id HAVING SUM(freq60) >=3
  ORDER BY tbl.id

我喜欢MSSQL的便捷函数IIF和DATEDIFF,但每次都指定相同的窗口有点尴尬。

以下代码适用于PgSQL,

with tbl as (
  select cust_id 
    ,case when extract(epoch from (ts - lag(ts, 1) over w) ) < 3600 then 1 
          else 0
     end as freq60
  from loginTable
  window w as (partition by id order by ts asc ) 
)
select cust_id 
from tbl
  group by tbl.cust_id having sum(freq60) >=3
  order by tbl.cust_id

这个想法很简单。按客户ID创建窗口框架,按时间排序成员行。让每一行的时间戳减去前一行的时间戳来获取间隔,如果间隔在60m内,则返回1,否则为0,然后对结果进行聚合。返回其freq&gt; = 3的id 在窗口函数中只进行了一次排序。在处理成千上万条记录时,速度比自连接快几个。

id  freq60  prev_ts ts  intval
1   0   null    2016-08-16 00:00:00 
1   1   2016-08-16 00:00:00 2016-08-16 00:25:00 25
1   1   2016-08-16 00:25:00 2016-08-16 00:47:00 22
1   1   2016-08-16 00:47:00 2016-08-16 01:05:00 18

答案 6 :(得分:0)

找到不跨越小时边界的简单方法是:

select
  id,
  datepart(yy,timestamp) as yy,
  datepart(mm,timestamp) as mm,
  datepart(dd,timestamp) as dd,
  datepart(hh,timestamp) as hh,
  count(*)
from
  logintable
group by
  id,
  datepart(yy,timestamp),
  datepart(mm,timestamp),
  datepart(dd,timestamp),
  datepart(hh,timestamp)
having
  count(*) >= 3

如果您的桌子非常大,您可能会将其击倒给每天至少登录三次的客户,然后自行加入。它仍然会错过跨越日期的登录,但这是一个简单的解决方案,可以让您在处理更复杂的工作时前进。