一个简单的表格:
ForumPost
--------------
ID (int PK)
UserID (int FK)
Date (datetime)
我希望返回特定用户连续几天至少每天发布一次 n 的帖子。
示例:
User 15844 has posted at least 1 post a day for 30 consecutive days 10 times
我用linq / lambda标记了这个问题以及解决方案也会很棒。我知道我可以通过迭代所有用户记录来解决这个问题,但这很慢。
答案 0 :(得分:4)
使用ROW_NUMBER()
可以使用一个方便的技巧来查找连续的条目,想象下面的日期集,以及它们的row_number(从0开始):
Date RowNumber
20130401 0
20130402 1
20130403 2
20130404 3
20130406 4
20130407 5
对于连续条目,如果从值中减去row_number,则得到相同的结果。 e.g。
Date RowNumber date - row_number
20130401 0 20130401
20130402 1 20130401
20130403 2 20130401
20130404 3 20130401
20130406 4 20130402
20130407 5 20130402
然后您可以按date - row_number
分组以获取连续日数(即前4条记录和最后2条记录)。
要将此应用于您的示例,您可以使用:
WITH Posts AS
( SELECT FirstPost = DATEADD(DAY, 1 - ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [Date]), [Date]),
UserID,
Date
FROM ( SELECT DISTINCT UserID, [Date] = CAST(Date AS [Date])
FROM ForumPost
) fp
), Posts2 AS
( SELECT FirstPost,
UserID,
Days = COUNT(*),
LastDate = MAX(Date)
FROM Posts
GROUP BY FirstPost, UserID
)
SELECT UserID, ConsecutiveDates = MAX(Days)
FROM Posts2
GROUP BY UserID;
<强> Example on SQL Fiddle (simple with just most consecutive days per user) 强>
<强> Further example to show how to get all consecutive periods 强>
编辑
我不认为上述内容已经回答了这个问题,这将给出用户发布或连续n天发布的次数:
WITH Posts AS
( SELECT FirstPost = DATEADD(DAY, 1 - ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [Date]), [Date]),
UserID,
Date
FROM ( SELECT DISTINCT UserID, [Date] = CAST(Date AS [Date])
FROM ForumPost
) fp
), Posts2 AS
( SELECT FirstPost,
UserID,
Days = COUNT(*),
FirstDate = MIN(Date),
LastDate = MAX(Date)
FROM Posts
GROUP BY FirstPost, UserID
)
SELECT UserID, [Times Over N Days] = COUNT(*)
FROM Posts2
WHERE Days >= 30
GROUP BY UserID;
<强> Example on SQL Fiddle 强>
答案 1 :(得分:1)
我认为你的特定应用程序非常简单。如果你有&#39; n&#39;在“&n” - 天的间隔中的不同日期,那些&nbsp;&#39; n&#39;不同的日期必须是连续的。
滚动到底部以获得一般解决方案,该解决方案仅需要公用表表达式并更改为PostgreSQL。 (开玩笑。我在PostgreSQL中实现,因为我的时间不够。)
create table ForumPost (
ID integer primary key,
UserID integer not null,
post_date date not null
);
insert into forumpost values
(1, 1, '2013-01-15'),
(2, 1, '2013-01-16'),
(3, 1, '2013-01-17'),
(4, 1, '2013-01-18'),
(5, 1, '2013-01-19'),
(6, 1, '2013-01-20'),
(7, 1, '2013-01-21'),
(11, 2, '2013-01-15'),
(12, 2, '2013-01-16'),
(13, 2, '2013-01-17'),
(16, 2, '2013-01-17'),
(14, 2, '2013-01-18'),
(15, 2, '2013-01-19'),
(21, 3, '2013-01-17'),
(22, 3, '2013-01-17'),
(23, 3, '2013-01-17'),
(24, 3, '2013-01-17'),
(25, 3, '2013-01-17'),
(26, 3, '2013-01-17'),
(27, 3, '2013-01-17');
现在,让我们看看这个查询的输出。为简洁起见,我每隔5天检查一次,而不是每隔30天。
select userid, count(distinct post_date) distinct_dates
from forumpost
where post_date between '2013-01-15' and '2013-01-19'
group by userid;
USERID DISTINCT_DATES
1 5
2 5
3 1
对于符合条件的用户,该5天间隔内的不同日期数必须为5,对吗?所以我们只需要将该逻辑添加到HAVING子句中。
select userid, count(distinct post_date) distinct_dates
from forumpost
where post_date between '2013-01-15' and '2013-01-19'
group by userid
having count(distinct post_date) = 5;
USERID DISTINCT_DATES
1 5
2 5
更通用的解决方案
如果您每天从2013-01-01发布到2013-01-31,那么您已经连续30天发布了2次,这没有任何意义。相反,我希望时钟能够在2013-01-31重新开始。我对在PostgreSQL中实现表示歉意;我稍后会尝试在T-SQL中实现。
with first_posts as (
select userid, min(post_date) first_post_date
from forumpost
group by userid
),
period_intervals as (
select userid, first_post_date period_start,
(first_post_date + interval '4' day)::date period_end
from first_posts
), user_specific_intervals as (
select
userid,
(period_start + (n || ' days')::interval)::date as period_start,
(period_end + (n || ' days')::interval)::date as period_end
from period_intervals, generate_series(0, 30, 5) n
)
select userid, period_start, period_end,
(select count(distinct post_date)
from forumpost
where forumpost.post_date between period_start and period_end
and userid = forumpost.userid) distinct_dates
from user_specific_intervals
order by userid, period_start;