以下用户历史记录表包含给定用户访问网站的每一天的一条记录(在24小时UTC时间段内)。它有数千条记录,但每个用户每天只有一条记录。如果用户当天没有访问过该网站,则不会生成任何记录。
Id UserId CreationDate ------ ------ ------------ 750997 12 2009-07-07 18:42:20.723 750998 15 2009-07-07 18:42:20.927 751000 19 2009-07-07 18:42:22.283
我正在寻找的是此表上具有良好性能的SQL查询,它告诉我哪些用户组连续几天访问了网站而没有错过一天。
换句话说,有多少用户在此表中有(n)条记录,包含顺序(前一天或后一天)日期?如果序列中缺少任何一天,则序列被破坏并应在1处重新开始;我们正在寻找在这里连续几天没有差距的用户。
此查询与a particular Stack Overflow badge之间的任何相似之处纯属巧合,当然......:)
答案 0 :(得分:147)
怎么样(并且请确保前面的语句以分号结尾):
WITH numberedrows
AS (SELECT ROW_NUMBER() OVER (PARTITION BY UserID
ORDER BY CreationDate)
- DATEDIFF(day,'19000101',CreationDate) AS TheOffset,
CreationDate,
UserID
FROM tablename)
SELECT MIN(CreationDate),
MAX(CreationDate),
COUNT(*) AS NumConsecutiveDays,
UserID
FROM numberedrows
GROUP BY UserID,
TheOffset
这个想法是,如果我们有天数列表(作为数字)和row_number,那么错过的天数会使这两个列表之间的偏移量略大。所以我们正在寻找具有一致偏移的范围。
您可以在此结尾处使用“ORDER BY NumConsecutiveDays DESC”,或者说“HAVING count(*)> 14”作为阈值......
我没有测试过这个 - 只是把它写在我的头顶。希望在SQL2005及以上版本中运行。
...并且对tablename(UserID,CreationDate)
的索引非常有帮助编辑:结果偏移是一个保留字,所以我使用了TheOffset。
编辑:使用COUNT(*)的建议是非常有效的 - 我应该首先做到这一点,但并没有真正思考。以前它使用的是datediff(day,min(CreationDate),max(CreationDate))。
罗布
答案 1 :(得分:69)
答案显然是:
SELECT DISTINCT UserId
FROM UserHistory uh1
WHERE (
SELECT COUNT(*)
FROM UserHistory uh2
WHERE uh2.CreationDate
BETWEEN uh1.CreationDate AND DATEADD(d, @days, uh1.CreationDate)
) = @days OR UserId = 52551
修改强>
好的,这是我认真的答案:
DECLARE @days int
DECLARE @seconds bigint
SET @days = 30
SET @seconds = (@days * 24 * 60 * 60) - 1
SELECT DISTINCT UserId
FROM (
SELECT uh1.UserId, Count(uh1.Id) as Conseq
FROM UserHistory uh1
INNER JOIN UserHistory uh2 ON uh2.CreationDate
BETWEEN uh1.CreationDate AND
DATEADD(s, @seconds, DATEADD(dd, DATEDIFF(dd, 0, uh1.CreationDate), 0))
AND uh1.UserId = uh2.UserId
GROUP BY uh1.Id, uh1.UserId
) as Tbl
WHERE Conseq >= @days
编辑:
[杰夫阿特伍德]这是一个非常快速的解决方案,值得被接受,但Rob Farley's solution is also excellent可以说甚至更快(!)。请检查一下!
答案 2 :(得分:18)
如果您可以更改表架构,我建议您在表格中添加一列LongestStreak
,并将其设置为以CreationDate
结尾的连续天数。在登录时更新表很容易(类似于您已经在做的事情,如果当天没有行,您将检查前一天是否存在任何行。如果为true,您将增加{{ 1}}在新行中,否则,您将其设置为1.)
添加此列后,查询将很明显:
LongestStreak
答案 3 :(得分:6)
一些非常有表现力的SQL:
select
userId,
dbo.MaxConsecutiveDates(CreationDate) as blah
from
dbo.Logins
group by
userId
假设你有一个user defined aggregate function的东西(小心这是错误的):
using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Runtime.InteropServices;
namespace SqlServerProject1
{
[StructLayout(LayoutKind.Sequential)]
[Serializable]
internal struct MaxConsecutiveState
{
public int CurrentSequentialDays;
public int MaxSequentialDays;
public SqlDateTime LastDate;
}
[Serializable]
[SqlUserDefinedAggregate(
Format.Native,
IsInvariantToNulls = true, //optimizer property
IsInvariantToDuplicates = false, //optimizer property
IsInvariantToOrder = false) //optimizer property
]
[StructLayout(LayoutKind.Sequential)]
public class MaxConsecutiveDates
{
/// <summary>
/// The variable that holds the intermediate result of the concatenation
/// </summary>
private MaxConsecutiveState _intermediateResult;
/// <summary>
/// Initialize the internal data structures
/// </summary>
public void Init()
{
_intermediateResult = new MaxConsecutiveState { LastDate = SqlDateTime.MinValue, CurrentSequentialDays = 0, MaxSequentialDays = 0 };
}
/// <summary>
/// Accumulate the next value, not if the value is null
/// </summary>
/// <param name="value"></param>
public void Accumulate(SqlDateTime value)
{
if (value.IsNull)
{
return;
}
int sequentialDays = _intermediateResult.CurrentSequentialDays;
int maxSequentialDays = _intermediateResult.MaxSequentialDays;
DateTime currentDate = value.Value.Date;
if (currentDate.AddDays(-1).Equals(new DateTime(_intermediateResult.LastDate.TimeTicks)))
sequentialDays++;
else
{
maxSequentialDays = Math.Max(sequentialDays, maxSequentialDays);
sequentialDays = 1;
}
_intermediateResult = new MaxConsecutiveState
{
CurrentSequentialDays = sequentialDays,
LastDate = currentDate,
MaxSequentialDays = maxSequentialDays
};
}
/// <summary>
/// Merge the partially computed aggregate with this aggregate.
/// </summary>
/// <param name="other"></param>
public void Merge(MaxConsecutiveDates other)
{
// add stuff for two separate calculations
}
/// <summary>
/// Called at the end of aggregation, to return the results of the aggregation.
/// </summary>
/// <returns></returns>
public SqlInt32 Terminate()
{
int max = Math.Max((int) ((sbyte) _intermediateResult.CurrentSequentialDays), (sbyte) _intermediateResult.MaxSequentialDays);
return new SqlInt32(max);
}
}
}
答案 4 :(得分:4)
似乎你可以利用这样一个事实:连续n天需要有n行。
类似于:
SELECT users.UserId, count(1) as cnt
FROM users
WHERE users.CreationDate > now() - INTERVAL 30 DAY
GROUP BY UserId
HAVING cnt = 30
答案 5 :(得分:3)
使用单个SQL查询执行此操作对我来说似乎过于复杂。让我将这个答案分为两部分。
答案 6 :(得分:2)
几个SQL Server 2012 options(假设N = 100以下)。
;WITH T(UserID, NRowsPrevious)
AS (SELECT UserID,
DATEDIFF(DAY,
LAG(CreationDate, 100)
OVER
(PARTITION BY UserID
ORDER BY CreationDate),
CreationDate)
FROM UserHistory)
SELECT DISTINCT UserID
FROM T
WHERE NRowsPrevious = 100
虽然使用我的样本数据,以下结果更有效
;WITH U
AS (SELECT DISTINCT UserId
FROM UserHistory) /*Ideally replace with Users table*/
SELECT UserId
FROM U
CROSS APPLY (SELECT TOP 1 *
FROM (SELECT
DATEDIFF(DAY,
LAG(CreationDate, 100)
OVER
(ORDER BY CreationDate),
CreationDate)
FROM UserHistory UH
WHERE U.UserId = UH.UserID) T(NRowsPrevious)
WHERE NRowsPrevious = 100) O
两者都依赖于问题中所述的约束,即每个用户每天最多只有一条记录。
答案 7 :(得分:2)
Joe Celko在SQL for Smarties中有一个完整的章节(称之为运行和序列)。我家里没有那本书,所以当我上班的时候......我真的会回答这个问题。 (假设历史表名为dbo.UserHistory,天数为@Days)
另一个领导来自SQL Team's blog on runs
我已经拥有的另一个想法,但没有一个方便的SQL服务器可以在这里使用带有分区ROW_NUMBER的CTE:
WITH Runs
AS
(SELECT UserID
, CreationDate
, ROW_NUMBER() OVER(PARTITION BY UserId
ORDER BY CreationDate)
- ROW_NUMBER() OVER(PARTITION BY UserId, NoBreak
ORDER BY CreationDate) AS RunNumber
FROM
(SELECT UH.UserID
, UH.CreationDate
, ISNULL((SELECT TOP 1 1
FROM dbo.UserHistory AS Prior
WHERE Prior.UserId = UH.UserId
AND Prior.CreationDate
BETWEEN DATEADD(dd, DATEDIFF(dd, 0, UH.CreationDate), -1)
AND DATEADD(dd, DATEDIFF(dd, 0, UH.CreationDate), 0)), 0) AS NoBreak
FROM dbo.UserHistory AS UH) AS Consecutive
)
SELECT UserID, MIN(CreationDate) AS RunStart, MAX(CreationDate) AS RunEnd
FROM Runs
GROUP BY UserID, RunNumber
HAVING DATEDIFF(dd, MIN(CreationDate), MAX(CreationDate)) >= @Days
上述情况可能 WAY HARDER ,但是当你有一个“跑步”的其他定义而不仅仅是约会时,它仍然是一个大脑的痒痒。
答案 8 :(得分:2)
您可以使用递归CTE(SQL Server 2005 +):
WITH recur_date AS (
SELECT t.userid,
t.creationDate,
DATEADD(day, 1, t.created) 'nextDay',
1 'level'
FROM TABLE t
UNION ALL
SELECT t.userid,
t.creationDate,
DATEADD(day, 1, t.created) 'nextDay',
rd.level + 1 'level'
FROM TABLE t
JOIN recur_date rd on t.creationDate = rd.nextDay AND t.userid = rd.userid)
SELECT t.*
FROM recur_date t
WHERE t.level = @numDays
ORDER BY t.userid
答案 9 :(得分:2)
如果这对您来说非常重要,请提供此事件并驾驶表格为您提供此信息。不需要用所有那些疯狂的查询来杀死机器。
答案 10 :(得分:1)
我使用简单的数学属性来识别连续访问该网站的人。此属性是您应该在第一次访问和上次时间之间的日差等于访问表日志中的记录数。
以下是我在Oracle DB中测试的SQL脚本(它也可以在其他DB中使用):
-- show basic understand of the math properties
select ceil(max (creation_date) - min (creation_date))
max_min_days_diff,
count ( * ) real_day_count
from user_access_log
group by user_id;
-- select all users that have consecutively accessed the site
select user_id
from user_access_log
group by user_id
having ceil(max (creation_date) - min (creation_date))
/ count ( * ) = 1;
-- get the count of all users that have consecutively accessed the site
select count(user_id) user_count
from user_access_log
group by user_id
having ceil(max (creation_date) - min (creation_date))
/ count ( * ) = 1;
表格准备脚本:
-- create table
create table user_access_log (id number, user_id number, creation_date date);
-- insert seed data
insert into user_access_log (id, user_id, creation_date)
values (1, 12, sysdate);
insert into user_access_log (id, user_id, creation_date)
values (2, 12, sysdate + 1);
insert into user_access_log (id, user_id, creation_date)
values (3, 12, sysdate + 2);
insert into user_access_log (id, user_id, creation_date)
values (4, 16, sysdate);
insert into user_access_log (id, user_id, creation_date)
values (5, 16, sysdate + 1);
insert into user_access_log (id, user_id, creation_date)
values (6, 16, sysdate + 5);
答案 11 :(得分:1)
这样的东西?
select distinct userid
from table t1, table t2
where t1.UserId = t2.UserId
AND trunc(t1.CreationDate) = trunc(t2.CreationDate) + n
AND (
select count(*)
from table t3
where t1.UserId = t3.UserId
and CreationDate between trunc(t1.CreationDate) and trunc(t1.CreationDate)+n
) = n
答案 12 :(得分:1)
declare @startdate as datetime, @days as int
set @startdate = cast('11 Jan 2009' as datetime) -- The startdate
set @days = 5 -- The number of consecutive days
SELECT userid
,count(1) as [Number of Consecutive Days]
FROM UserHistory
WHERE creationdate >= @startdate
AND creationdate < dateadd(dd, @days, cast(convert(char(11), @startdate, 113) as datetime))
GROUP BY userid
HAVING count(1) >= @days
语句cast(convert(char(11), @startdate, 113) as datetime)
删除了日期的时间部分,因此我们从午夜开始。
我还假设creationdate
和userid
列已编入索引。
我刚刚意识到这不会告诉你所有用户及其连续的总天数。但是会告诉您哪些用户将从您选择的日期起访问一定天数。
修订解决方案:
declare @days as int
set @days = 30
select t1.userid
from UserHistory t1
where (select count(1)
from UserHistory t3
where t3.userid = t1.userid
and t3.creationdate >= DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate), 0)
and t3.creationdate < DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate) + @days, 0)
group by t3.userid
) >= @days
group by t1.userid
我已经检查了这个,它将查询所有用户和所有日期。它基于Spencer's 1st (joke?) solution,但我的工作正常。
更新:改进了第二个解决方案中的日期处理。
答案 13 :(得分:0)
Spencer几乎做到了,但这应该是工作代码:
SELECT DISTINCT UserId
FROM History h1
WHERE (
SELECT COUNT(*)
FROM History
WHERE UserId = h1.UserId AND CreationDate BETWEEN h1.CreationDate AND DATEADD(d, @n-1, h1.CreationDate)
) >= @n
答案 14 :(得分:0)
脱离我的头脑,MySQLish:
SELECT start.UserId
FROM UserHistory AS start
LEFT OUTER JOIN UserHistory AS pre_start ON pre_start.UserId=start.UserId
AND DATE(pre_start.CreationDate)=DATE_SUB(DATE(start.CreationDate), INTERVAL 1 DAY)
LEFT OUTER JOIN UserHistory AS subsequent ON subsequent.UserId=start.UserId
AND DATE(subsequent.CreationDate)<=DATE_ADD(DATE(start.CreationDate), INTERVAL 30 DAY)
WHERE pre_start.Id IS NULL
GROUP BY start.Id
HAVING COUNT(subsequent.Id)=30
未经测试,几乎肯定需要对MSSQL进行一些转换,但我认为这可以提供一些想法。
答案 15 :(得分:0)
如何使用Tally表?它遵循更加算法的方法,执行计划是轻而易举的。使用从1到'MaxDaysBehind'的数字填充tallyTable,以便扫描表格(即90后面将会查找3个月等)。
declare @ContinousDays int
set @ContinousDays = 30 -- select those that have 30 consecutive days
create table #tallyTable (Tally int)
insert into #tallyTable values (1)
...
insert into #tallyTable values (90) -- insert numbers for as many days behind as you want to scan
select [UserId],count(*),t.Tally from HistoryTable
join #tallyTable as t on t.Tally>0
where [CreationDate]> getdate()-@ContinousDays-t.Tally and
[CreationDate]<getdate()-t.Tally
group by [UserId],t.Tally
having count(*)>=@ContinousDays
delete #tallyTable
答案 16 :(得分:0)
稍微调整Bill的查询。您可能必须在分组之前截断日期,以便每天只计算一次登录...
SELECT UserId from History
WHERE CreationDate > ( now() - n )
GROUP BY UserId,
DATEADD(dd, DATEDIFF(dd, 0, CreationDate), 0) AS TruncatedCreationDate
HAVING COUNT(TruncatedCreationDate) >= n
EDITED使用DATEADD(dd,DATEDIFF(dd,0,CreationDate),0)而不是convert(char(10),CreationDate,101)。
@IDisposable 我之前想要使用datepart但是我懒得查找语法,所以我想我会使用转换。我知道它有重大影响谢谢!现在我知道了。
答案 17 :(得分:0)
假设架构如下:
create table dba.visits
(
id integer not null,
user_id integer not null,
creation_date date not null
);
这将从具有间隙的日期序列中提取连续范围。
select l.creation_date as start_d, -- Get first date in contiguous range
(
select min(a.creation_date ) as creation_date
from "DBA"."visits" a
left outer join "DBA"."visits" b on
a.creation_date = dateadd(day, -1, b.creation_date ) and
a.user_id = b.user_id
where b.creation_date is null and
a.creation_date >= l.creation_date and
a.user_id = l.user_id
) as end_d -- Get last date in contiguous range
from "DBA"."visits" l
left outer join "DBA"."visits" r on
r.creation_date = dateadd(day, -1, l.creation_date ) and
r.user_id = l.user_id
where r.creation_date is null
答案 18 :(得分:0)
这应该做你想要的,但我没有足够的数据来测试效率。复杂的CONVERT / FLOOR东西是剥离日期时间字段的时间部分。如果您使用的是SQL Server 2008,则可以使用CAST(x.CreationDate AS DATE)。
DECLARE @Range as INT SET @Range = 10 SELECT DISTINCT UserId, CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate))) FROM tblUserLogin a WHERE EXISTS (SELECT 1 FROM tblUserLogin b WHERE a.userId = b.userId AND (SELECT COUNT(DISTINCT(CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, CreationDate))))) FROM tblUserLogin c WHERE c.userid = b.userid AND CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, c.CreationDate))) BETWEEN CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate))) and CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate)))+@Range-1) = @Range)
创作脚本
CREATE TABLE [dbo].[tblUserLogin]( [Id] [int] IDENTITY(1,1) NOT NULL, [UserId] [int] NULL, [CreationDate] [datetime] NULL ) ON [PRIMARY]